Búsqueda | Portal Regional de la BVS

1.

WebGIVI: a web-based gene enrichment analysis and visualization tool.

Sun, Liang; Zhu, Yongnan; Mahmood, A S M Ashique; Tudor, Catalina O; Ren, Jia; Vijay-Shanker, K; Chen, Jian; Schmidt, Carl J.

BMC Bioinformatics ; 18(1): 237, 2017 May 04.

Artículo en Inglés | MEDLINE | ID: mdl-28472919

RESUMEN

BACKGROUND: A major challenge of high throughput transcriptome studies is presenting the data to researchers in an interpretable format. In many cases, the outputs of such studies are gene lists which are then examined for enriched biological concepts. One approach to help the researcher interpret large gene datasets is to associate genes and informative terms (iTerm) that are obtained from the biomedical literature using the eGIFT text-mining system. However, examining large lists of iTerm and gene pairs is a daunting task. RESULTS: We have developed WebGIVI, an interactive web-based visualization tool ( http://raven.anr.udel.edu/webgivi/ ) to explore gene:iTerm pairs. WebGIVI was built via Cytoscape and Data Driven Document JavaScript libraries and can be used to relate genes to iTerms and then visualize gene and iTerm pairs. WebGIVI can accept a gene list that is used to retrieve the gene symbols and corresponding iTerm list. This list can be submitted to visualize the gene iTerm pairs using two distinct methods: a Concept Map or a Cytoscape Network Map. In addition, WebGIVI also supports uploading and visualization of any two-column tab separated data. CONCLUSIONS: WebGIVI provides an interactive and integrated network graph of gene and iTerms that allows filtering, sorting, and grouping, which can aid biologists in developing hypothesis based on the input gene lists. In addition, WebGIVI can visualize hundreds of nodes and generate a high-resolution image that is important for most of research publications. The source code can be freely downloaded at https://github.com/sunliang3361/WebGIVI . The WebGIVI tutorial is available at http://raven.anr.udel.edu/webgivi/tutorial.php .

Asunto(s)

Minería de Datos/métodos , Genes , Genómica/métodos , Programas Informáticos , Internet

2.

iPTMnet: Integrative Bioinformatics for Studying PTM Networks.

Ross, Karen E; Huang, Hongzhan; Ren, Jia; Arighi, Cecilia N; Li, Gang; Tudor, Catalina O; Lv, Mengxi; Lee, Jung-Youn; Chen, Sheng-Chih; Vijay-Shanker, K; Wu, Cathy H.

Methods Mol Biol ; 1558: 333-353, 2017.

Artículo en Inglés | MEDLINE | ID: mdl-28150246

RESUMEN

Protein post-translational modification (PTM) is an essential cellular regulatory mechanism, and disruptions in PTM have been implicated in disease. PTMs are an active area of study in many fields, leading to a wealth of PTM information in the scientific literature. There is a need for user-friendly bioinformatics resources that capture PTM information from the literature and support analyses of PTMs and their functional consequences. This chapter describes the use of iPTMnet ( http://proteininformationresource.org/iPTMnet/ ), a resource that integrates PTM information from text mining, curated databases, and ontologies and provides visualization tools for exploring PTM networks, PTM crosstalk, and PTM conservation across species. We present several PTM-related queries and demonstrate how they can be addressed using iPTMnet.

Asunto(s)

Biología Computacional/métodos , Bases de Datos de Proteínas , Procesamiento Proteico-Postraduccional , Programas Informáticos , Navegador Web , Animales , Minería de Datos/métodos , Humanos , Ratones , Fosfotransferasas , Proteínas de Plantas , Unión Proteica , Mapeo de Interacción de Proteínas/métodos , Mapas de Interacción de Proteínas , Ratas , Motor de Búsqueda , Interfaz Usuario-Computador

3.

miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.

Gupta, Samir; Ross, Karen E; Tudor, Catalina O; Wu, Cathy H; Schmidt, Carl J; Vijay-Shanker, K.

J Biomed Semantics ; 7(1): 9, 2016 04 29.

Artículo en Inglés | MEDLINE | ID: mdl-27216254

RESUMEN

BACKGROUND: MicroRNAs are increasingly being appreciated as critical players in human diseases, and questions concerning the role of microRNAs arise in many areas of biomedical research. There are several manually curated databases of microRNA-disease associations gathered from the biomedical literature; however, it is difficult for curators of these databases to keep up with the explosion of publications in the microRNA-disease field. Moreover, automated literature mining tools that assist manual curation of microRNA-disease associations currently capture only one microRNA property (expression) in the context of one disease (cancer). Thus, there is a clear need to develop more sophisticated automated literature mining tools that capture a variety of microRNA properties and relations in the context of multiple diseases to provide researchers with fast access to the most recent published information and to streamline and accelerate manual curation. METHODS: We have developed miRiaD (microRNAs in association with Disease), a text-mining tool that automatically extracts associations between microRNAs and diseases from the literature. These associations are often not directly linked, and the intermediate relations are often highly informative for the biomedical researcher. Thus, miRiaD extracts the miR-disease pairs together with an explanation for their association. We also developed a procedure that assigns scores to sentences, marking their informativeness, based on the microRNA-disease relation observed within the sentence. RESULTS: miRiaD was applied to the entire Medline corpus, identifying 8301 PMIDs with miR-disease associations. These abstracts and the miR-disease associations are available for browsing at http://biotm.cis.udel.edu/miRiaD . We evaluated the recall and precision of miRiaD with respect to information of high interest to public microRNA-disease database curators (expression and target gene associations), obtaining a recall of 88.46-90.78. When we expanded the evaluation to include sentences with a wide range of microRNA-disease information that may be of interest to biomedical researchers, miRiaD also performed very well with a F-score of 89.4. The informativeness ranking of sentences was evaluated in terms of nDCG (0.977) and correlation metrics (0.678-0.727) when compared to an annotator's ranked list. CONCLUSIONS: miRiaD, a high performance system that can capture a wide variety of microRNA-disease related information, extends beyond the scope of existing microRNA-disease resources. It can be incorporated into manual curation pipelines and serve as a resource for biomedical researchers interested in the role of microRNAs in disease. In our ongoing work we are developing an improved miRiaD web interface that will facilitate complex queries about microRNA-disease relationships, such as "In what diseases does microRNA regulation of apoptosis play a role?" or "Is there overlap in the sets of genes targeted by microRNAs in different types of dementia?"."

Asunto(s)

Ontologías Biológicas , Minería de Datos/métodos , Enfermedad/genética , MicroARNs/genética , Investigación Biomédica , Internet , Procesamiento de Lenguaje Natural , Semántica

4.

Transcriptome response to heat stress in a chicken hepatocellular carcinoma cell line.

Sun, Liang; Lamont, Susan J; Cooksey, Amanda M; McCarthy, Fiona; Tudor, Catalina O; Vijay-Shanker, K; DeRita, Rachael M; Rothschild, Max; Ashwell, Chris; Persia, Michael E; Schmidt, Carl J.

Cell Stress Chaperones ; 20(6): 939-50, 2015 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-26238561

RESUMEN

Heat stress triggers an evolutionarily conserved set of responses in cells. The transcriptome responds to hyperthermia by altering expression of genes to adapt the cell or organism to survive the heat challenge. RNA-seq technology allows rapid identification of environmentally responsive genes on a large scale. In this study, we have used RNA-seq to identify heat stress responsive genes in the chicken male white leghorn hepatocellular (LMH) cell line. The transcripts of 812 genes were responsive to heat stress (p < 0.01) with 235 genes upregulated and 577 downregulated following 2.5 h of heat stress. Among the upregulated were genes whose products function as chaperones, along with genes affecting collagen synthesis and deposition, transcription factors, chromatin remodelers, and genes modulating the WNT and TGF-beta pathways. Predominant among the downregulated genes were ones that affect DNA replication and repair along with chromosomal segregation. Many of the genes identified in this study have not been previously implicated in the heat stress response. These data extend our understanding of the transcriptome response to heat stress with many of the identified biological processes and pathways likely to function in adapting cells and organisms to hyperthermic stress. Furthermore, this study should provide important insight to future efforts attempting to improve species abilities to withstand heat stress through genome-wide association studies and breeding.

Asunto(s)

Carcinoma Hepatocelular/genética , Transcriptoma/genética , Animales , Línea Celular Tumoral , Pollos , Respuesta al Choque Térmico/genética , Respuesta al Choque Térmico/fisiología , Calor

5.

Construction of phosphorylation interaction networks by text mining of full-length articles using the eFIP system.

Tudor, Catalina O; Ross, Karen E; Li, Gang; Vijay-Shanker, K; Wu, Cathy H; Arighi, Cecilia N.

Database (Oxford) ; 20152015.

Artículo en Inglés | MEDLINE | ID: mdl-25833953

RESUMEN

Protein phosphorylation is a reversible post-translational modification where a protein kinase adds a phosphate group to a protein, potentially regulating its function, localization and/or activity. Phosphorylation can affect protein-protein interactions (PPIs), abolishing interaction with previous binding partners or enabling new interactions. Extracting phosphorylation information coupled with PPI information from the scientific literature will facilitate the creation of phosphorylation interaction networks of kinases, substrates and interacting partners, toward knowledge discovery of functional outcomes of protein phosphorylation. Increasingly, PPI databases are interested in capturing the phosphorylation state of interacting partners. We have previously developed the eFIP (Extracting Functional Impact of Phosphorylation) text mining system, which identifies phosphorylated proteins and phosphorylation-dependent PPIs. In this work, we present several enhancements for the eFIP system: (i) text mining for full-length articles from the PubMed Central open-access collection; (ii) the integration of the RLIMS-P 2.0 system for the extraction of phosphorylation events with kinase, substrate and site information; (iii) the extension of the PPI module with new trigger words/phrases describing interactions and (iv) the addition of the iSimp tool for sentence simplification to aid in the matching of syntactic patterns. We enhance the website functionality to: (i) support searches based on protein roles (kinases, substrates, interacting partners) or using keywords; (ii) link protein entities to their corresponding UniProt identifiers if mapped and (iii) support visual exploration of phosphorylation interaction networks using Cytoscape. The evaluation of eFIP on full-length articles achieved 92.4% precision, 76.5% recall and 83.7% F-measure on 100 article sections. To demonstrate eFIP for knowledge extraction and discovery, we constructed phosphorylation-dependent interaction networks involving 14-3-3 proteins identified from cancer-related versus diabetes-related articles. Comparison of the phosphorylation interaction network of kinases, phosphoproteins and interactants obtained from eFIP searches, along with enrichment analysis of the protein set, revealed several shared interactions, highlighting common pathways discussed in the context of both diseases.

Asunto(s)

Bases de Datos de Proteínas , Diabetes Mellitus , Proteínas de Neoplasias , Neoplasias , Fosfoproteínas , Proteínas Quinasas , Minería de Datos , Diabetes Mellitus/genética , Diabetes Mellitus/metabolismo , Redes Reguladoras de Genes , Humanos , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo , Neoplasias/genética , Neoplasias/metabolismo , Fosfoproteínas/genética , Fosfoproteínas/metabolismo , Fosforilación , Proteínas Quinasas/genética , Proteínas Quinasas/metabolismo

6.

iSimp in BioC standard format: enhancing the interoperability of a sentence simplification system.

Peng, Yifan; Tudor, Catalina O; Torii, Manabu; Wu, Cathy H; Vijay-Shanker, K.

Database (Oxford) ; 20142014.

Artículo en Inglés | MEDLINE | ID: mdl-24850848

RESUMEN

This article reports the use of the BioC standard format in our sentence simplification system, iSimp, and demonstrates its general utility. iSimp is designed to simplify complex sentences commonly found in the biomedical text, and has been shown to improve existing text mining applications that rely on the analysis of sentence structures. By adopting the BioC format, we aim to make iSimp readily interoperable with other applications in the biomedical domain. To examine the utility of iSimp in BioC, we implemented a rule-based relation extraction system that uses iSimp as a preprocessing module and BioC for data exchange. Evaluation on the training corpus of BioNLP-ST 2011 GENIA Event Extraction (GE) task showed that iSimp sentence simplification improved the recall by 3.2% without reducing precision. The iSimp simplification-annotated corpora, both our previously used corpus and the GE corpus in the current study, have been converted into the BioC format and made publicly available at the project's Web site: http://research.bioinformatics.udel.edu/isimp/. Database URL:http://research.bioinformatics.udel.edu/isimp/

Asunto(s)

Algoritmos , Minería de Datos/métodos , Procesamiento de Lenguaje Natural , Semántica , Internet

7.

An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.

Arighi, Cecilia N; Carterette, Ben; Cohen, K Bretonnel; Krallinger, Martin; Wilbur, W John; Fey, Petra; Dodson, Robert; Cooper, Laurel; Van Slyke, Ceri E; Dahdul, Wasila; Mabee, Paula; Li, Donghui; Harris, Bethany; Gillespie, Marc; Jimenez, Silvia; Roberts, Phoebe; Matthews, Lisa; Becker, Kevin; Drabkin, Harold; Bello, Susan; Licata, Luana; Chatr-aryamontri, Andrew; Schaeffer, Mary L; Park, Julie; Haendel, Melissa; Van Auken, Kimberly; Li, Yuling; Chan, Juancarlos; Muller, Hans-Michael; Cui, Hong; Balhoff, James P; Chi-Yang Wu, Johnny; Lu, Zhiyong; Wei, Chih-Hsuan; Tudor, Catalina O; Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar; Cejuela, Juan Miguel; Dubey, Pratibha; Wu, Cathy.

Database (Oxford) ; 2013: bas056, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-23327936

RESUMEN

In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (â¼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.

Asunto(s)

Minería de Datos , Educación , Bases de Datos como Asunto , Documentación , Humanos , Programas Informáticos , Factores de Tiempo

8.

A framework for biomedical figure segmentation towards image-based document retrieval.

Lopez, Luis D; Yu, Jingyi; Arighi, Cecilia; Tudor, Catalina O; Torii, Manabu; Huang, Hongzhan; Vijay-Shanker, K; Wu, Cathy.

BMC Syst Biol ; 7 Suppl 4: S8, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-24565394

RESUMEN

The figures included in many of the biomedical publications play an important role in understanding the biological experiments and facts described within. Recent studies have shown that it is possible to integrate the information that is extracted from figures in classical document classification and retrieval tasks in order to improve their accuracy. One important observation about the figures included in biomedical publications is that they are often composed of multiple subfigures or panels, each describing different methodologies or results. The use of these multimodal figures is a common practice in bioscience, as experimental results are graphically validated via multiple methodologies or procedures. Thus, for a better use of multimodal figures in document classification or retrieval tasks, as well as for providing the evidence source for derived assertions, it is important to automatically segment multimodal figures into subfigures and panels. This is a challenging task, however, as different panels can contain similar objects (i.e., barcharts and linecharts) with multiple layouts. Also, certain types of biomedical figures are text-heavy (e.g., DNA sequences and protein sequences images) and they differ from traditional images. As a result, classical image segmentation techniques based on low-level image features, such as edges or color, are not directly applicable to robustly partition multimodal figures into single modal panels. In this paper, we describe a robust solution for automatically identifying and segmenting unimodal panels from a multimodal figure. Our framework starts by robustly harvesting figure-caption pairs from biomedical articles. We base our approach on the observation that the document layout can be used to identify encoded figures and figure boundaries within PDF files. Taking into consideration the document layout allows us to correctly extract figures from the PDF document and associate their corresponding caption. We combine pixel-level representations of the extracted images with information gathered from their corresponding captions to estimate the number of panels in the figure. Thus, our approach simultaneously identifies the number of panels and the layout of figures. In order to evaluate the approach described here, we applied our system on documents containing protein-protein interactions (PPIs) and compared the results against a gold standard that was annotated by biologists. Experimental results showed that our automatic figure segmentation approach surpasses pure caption-based and image-based approaches, achieving a 96.64% accuracy. To allow for efficient retrieval of information, as well as to provide the basis for integration into document classification and retrieval systems among other, we further developed a web-based interface that lets users easily retrieve panels containing the terms specified in the user queries.

Asunto(s)

Investigación Biomédica , Biología Computacional/métodos , Gráficos por Computador , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Imagen Asistido por Computador

9.

The eFIP system for text mining of protein interaction networks of phosphorylated proteins.

Tudor, Catalina O; Arighi, Cecilia N; Wang, Qinghua; Wu, Cathy H; Vijay-Shanker, K.

Database (Oxford) ; 2012: bas044, 2012.

Artículo en Inglés | MEDLINE | ID: mdl-23221174

RESUMEN

Protein phosphorylation is a central regulatory mechanism in signal transduction involved in most biological processes. Phosphorylation of a protein may lead to activation or repression of its activity, alternative subcellular location and interaction with different binding partners. Extracting this type of information from scientific literature is critical for connecting phosphorylated proteins with kinases and interaction partners, along with their functional outcomes, for knowledge discovery from phosphorylation protein networks. We have developed the Extracting Functional Impact of Phosphorylation (eFIP) text mining system, which combines several natural language processing techniques to find relevant abstracts mentioning phosphorylation of a given protein together with indications of protein-protein interactions (PPIs) and potential evidences for impact of phosphorylation on the PPIs. eFIP integrates our previously developed tools, Extracting Gene Related ABstracts (eGRAB) for document retrieval and name disambiguation, Rule-based LIterature Mining System (RLIMS-P) for Protein Phosphorylation for extraction of phosphorylation information, a PPI module to detect PPIs involving phosphorylated proteins and an impact module for relation extraction. The text mining system has been integrated into the curation workflow of the Protein Ontology (PRO) to capture knowledge about phosphorylated proteins. The eFIP web interface accepts gene/protein names or identifiers, or PubMed identifiers as input, and displays results as a ranked list of abstracts with sentence evidence and summary table, which can be exported in a spreadsheet upon result validation. As a participant in the BioCreative-2012 Interactive Text Mining track, the performance of eFIP was evaluated on document retrieval (F-measures of 78-100%), sentence-level information extraction (F-measures of 70-80%) and document ranking (normalized discounted cumulative gain measures of 93-100% and mean average precision of 0.86). The utility and usability of the eFIP web interface were also evaluated during the BioCreative Workshop. The use of the eFIP interface provided a significant speed-up (â¼2.5-fold) for time to completion of the curation task. Additionally, eFIP significantly simplifies the task of finding relevant articles on PPI involving phosphorylated forms of a given protein.

Asunto(s)

Minería de Datos/métodos , Bases de Datos de Proteínas , Fosfoproteínas/metabolismo , Mapas de Interacción de Proteínas , Indización y Redacción de Resúmenes , Documentación , Anotación de Secuencia Molecular , Fosforilación , Proteína Letal Asociada a bcl/metabolismo

10.

Developing a biocuration workflow for AgBase, a non-model organism database.

Pillai, Lakshmi; Chouvarine, Philippe; Tudor, Catalina O; Schmidt, Carl J; Vijay-Shanker, K; McCarthy, Fiona M.

Database (Oxford) ; 2012: bas038, 2012.

Artículo en Inglés | MEDLINE | ID: mdl-23160411

RESUMEN

AgBase provides annotation for agricultural gene products using the Gene Ontology (GO) and Plant Ontology, as appropriate. Unlike model organism species, agricultural species have a body of literature that does not just focus on gene function; to improve efficiency, we use text mining to identify literature for curation. The first component of our annotation interface is the gene prioritization interface that ranks gene products for annotation. Biocurators select the top-ranked gene and mark annotation for these genes as 'in progress' or 'completed'; links enable biocurators to move directly to our biocuration interface (BI). Our BI includes all current GO annotation for gene products and is the main interface to add/modify AgBase curation data. The BI also displays Extracting Genic Information from Text (eGIFT) results for each gene product. eGIFT is a web-based, text-mining tool that associates ranked, informative terms (iTerms) and the articles and sentences containing them, with genes. Moreover, iTerms are linked to GO terms, where they match either a GO term name or a synonym. This enables AgBase biocurators to rapidly identify literature for further curation based on possible GO terms. Because most agricultural species do not have standardized literature, eGIFT searches all gene names and synonyms to associate articles with genes. As many of the gene names can be ambiguous, eGIFT applies a disambiguation step to remove matches that do not correspond to this gene, and filtering is applied to remove abstracts that mention a gene in passing. The BI is linked to our Journal Database (JDB) where corresponding journal citations are stored. Just as importantly, biocurators also add to the JDB citations that have no GO annotation. The AgBase BI also supports bulk annotation upload to facilitate our Inferred from electronic annotation of agricultural gene products. All annotations must pass standard GO Consortium quality checking before release in AgBase. Database URL: http://www.agbase.msstate.edu/.

Asunto(s)

Agricultura , Minería de Datos/métodos , Bases de Datos Genéticas , Flujo de Trabajo , Genes de Plantas/genética , Anotación de Secuencia Molecular , Publicaciones Periódicas como Asunto , Control de Calidad

11.

eFIP: a tool for mining functional impact of phosphorylation from literature.

Arighi, Cecilia N; Siu, Amy Y; Tudor, Catalina O; Nchoutmboube, Jules A; Wu, Cathy H; Shanker, Vijay K.

Methods Mol Biol ; 694: 63-75, 2011.

Artículo en Inglés | MEDLINE | ID: mdl-21082428

RESUMEN

Technologies and experimental strategies have improved dramatically in the field of genomics and proteomics facilitating analysis of cellular and biochemical processes, as well as of proteins networks. Based on numerous such analyses, there has been a significant increase of publications in life sciences and biomedicine. In this respect, knowledge bases are struggling to cope with the literature volume and they may not be able to capture in detail certain aspects of proteins and genes. One important aspect of proteins is their phosphorylated states and their implication in protein function and protein interacting networks. For this reason, we developed eFIP, a web-based tool, which aids scientists to find quickly abstracts mentioning phosphorylation of a given protein (including site and kinase), coupled with mentions of interactions and functional aspects of the protein. eFIP combines information provided by applications such as eGRAB, RLIMS-P, eGIFT and AIIAGMT, to rank abstracts mentioning phosphorylation, and to display the results in a highlighted and tabular format for a quick inspection. In this chapter, we present a case study of results returned by eFIP for the protein BAD, which is a key regulator of apoptosis that is posttranslationally modified by phosphorylation.

Asunto(s)

Biología Computacional/métodos , Minería de Datos/métodos , Proteínas/metabolismo , Programas Informáticos , Animales , Humanos , Internet , Fosforilación , Informe de Investigación

12.

eGIFT: mining gene information from the literature.

Tudor, Catalina O; Schmidt, Carl J; Vijay-Shanker, K.

BMC Bioinformatics ; 11: 418, 2010 Aug 09.

Artículo en Inglés | MEDLINE | ID: mdl-20696046

RESUMEN

BACKGROUND: With the biomedical literature continually expanding, searching PubMed for information about specific genes becomes increasingly difficult. Not only can thousands of results be returned, but gene name ambiguity leads to many irrelevant hits. As a result, it is difficult for life scientists and gene curators to rapidly get an overall picture about a specific gene from documents that mention its names and synonyms. RESULTS: In this paper, we present eGIFT (http://biotm.cis.udel.edu/eGIFT), a web-based tool that associates informative terms, called iTerms, and sentences containing them, with genes. To associate iTerms with a gene, eGIFT ranks iTerms about the gene, based on a score which compares the frequency of occurrence of a term in the gene's literature to its frequency of occurrence in documents about genes in general. To retrieve a gene's documents (Medline abstracts), eGIFT considers all gene names, aliases, and synonyms. Since many of the gene names can be ambiguous, eGIFT applies a disambiguation step to remove matches that do not correspond to this gene. Another additional filtering process is applied to retain those abstracts that focus on the gene rather than mention it in passing. eGIFT's information for a gene is pre-computed and users of eGIFT can search for genes by using a name or an EntrezGene identifier. iTerms are grouped into different categories to facilitate a quick inspection. eGIFT also links an iTerm to sentences mentioning the term to allow users to see the relation between the iTerm and the gene. We evaluated the precision and recall of eGIFT's iTerms for 40 genes; between 88% and 94% of the iTerms were marked as salient by our evaluators, and 94% of the UniProtKB keywords for these genes were also identified by eGIFT as iTerms. CONCLUSIONS: Our evaluations suggest that iTerms capture highly-relevant aspects of genes. Furthermore, by showing sentences containing these terms, eGIFT can provide a quick description of a specific gene. eGIFT helps not only life scientists survey results of high-throughput experiments, but also annotators to find articles describing gene aspects and functions.

Asunto(s)

Minería de Datos , Genes , Publicaciones Periódicas como Asunto , Internet , Programas Informáticos , Terminología como Asunto

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA