Búsqueda | Portal Regional de la BVS

Extraction of human kinase mutations from literature, databases and genotyping studies.

Krallinger, Martin; Izarzugaza, Jose M G; Rodriguez-Penagos, Carlos; Valencia, Alfonso.

BMC Bioinformatics ; 10 Suppl 8: S1, 2009 Aug 27.

Artículo en Inglés | MEDLINE | ID: mdl-19758464

RESUMEN

BACKGROUND: There is a considerable interest in characterizing the biological role of specific protein residue substitutions through mutagenesis experiments. Additionally, recent efforts related to the detection of disease-associated SNPs motivated both the manual annotation, as well as the automatic extraction, of naturally occurring sequence variations from the literature, especially for protein families that play a significant role in signaling processes such as kinases. Systematic integration and comparison of kinase mutation information from multiple sources, covering literature, manual annotation databases and large-scale experiments can result in a more comprehensive view of functional, structural and disease associated aspects of protein sequence variants. Previously published mutation extraction approaches did not sufficiently distinguish between two fundamentally different variation origin categories, namely natural occurring and induced mutations generated through in vitro experiments. RESULTS: We present a literature mining pipeline for the automatic extraction and disambiguation of single-point mutation mentions from both abstracts as well as full text articles, followed by a sequence validation check to link mutations to their corresponding kinase protein sequences. Each mutation is scored according to whether it corresponds to an induced mutation or a natural sequence variant. We were able to provide direct literature links for a considerable fraction of previously annotated kinase mutations, enabling thus more efficient interpretation of their biological characterization and experimental context. In order to test the capabilities of the presented pipeline, the mutations in the protein kinase domain of the kinase family were analyzed. Using our literature extraction system, we were able to recover a total of 643 mutations-protein associations from PubMed abstracts and 6,970 from a large collection of full text articles. When compared to state-of-the-art annotation databases and high throughput genotyping studies, the mutation mentions extracted from the literature overlap to a good extent with the existing knowledgebases, whereas the remaining mentions suggest new mutation records that were not previously annotated in the databases. CONCLUSION: Using the proposed residue disambiguation and classification approach, we were able to differentiate between natural variant and mutagenesis types of mutations with an accuracy of 93.88. The resulting system is useful for constructing a Gold Standard set of mutations extracted from the literature by human experts with minimal manual curation effort, providing direct pointers to relevant evidence sentences. Our system is able to recover mutations from the literature that are not present in state-of-the-art databases. Human expert manual validation of a subset of the literature extracted mutations conducted on 100 mutations from PubMed abstracts highlights that almost three quarters (72%) of the extracted mutations turned out to be correct, and more than half of these had not been previously annotated in databases.

Asunto(s)

Almacenamiento y Recuperación de la Información/métodos , Mutación , Proteínas Quinasas/genética , Bases de Datos Genéticas , Receptores ErbB/genética , Genómica , Genotipo , Humanos , Publicaciones Periódicas como Asunto , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN

PLAN2L: a web tool for integrated text mining and literature-based bioentity relation extraction.

Krallinger, Martin; Rodriguez-Penagos, Carlos; Tendulkar, Ashish; Valencia, Alfonso.

Nucleic Acids Res ; 37(Web Server issue): W160-5, 2009 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-19520768

RESUMEN

There is an increasing interest in using literature mining techniques to complement information extracted from annotation databases or generated by bioinformatics applications. Here we present PLAN2L, a web-based online search system that integrates text mining and information extraction techniques to access systematically information useful for analyzing genetic, cellular and molecular aspects of the plant model organism Arabidopsis thaliana. Our system facilitates a more efficient retrieval of information relevant to heterogeneous biological topics, from implications in biological relationships at the level of protein interactions and gene regulation, to sub-cellular locations of gene products and associations to cellular and developmental processes, i.e. cell cycle, flowering, root, leaf and seed development. Beyond single entities, also predefined pairs of entities can be provided as queries for which literature-derived relations together with textual evidences are returned. PLAN2L does not require registration and is freely accessible at http://zope.bioinfo.cnio.es/plan2l.

Asunto(s)

Arabidopsis/fisiología , Almacenamiento y Recuperación de la Información/métodos , Programas Informáticos , Proteína AGAMOUS de Arabidopsis/metabolismo , Arabidopsis/genética , Arabidopsis/crecimiento & desarrollo , Proteínas de Arabidopsis/metabolismo , Internet , Integración de Sistemas , Factores de Transcripción/metabolismo

Overview of the protein-protein interaction annotation extraction task of BioCreative II.

Krallinger, Martin; Leitner, Florian; Rodriguez-Penagos, Carlos; Valencia, Alfonso.

Genome Biol ; 9 Suppl 2: S4, 2008.

Artículo en Inglés | MEDLINE | ID: mdl-18834495

RESUMEN

BACKGROUND: The biomedical literature is the primary information source for manual protein-protein interaction annotations. Text-mining systems have been implemented to extract binary protein interactions from articles, but a comprehensive comparison between the different techniques as well as with manual curation was missing. RESULTS: We designed a community challenge, the BioCreative II protein-protein interaction (PPI) task, based on the main steps of a manual protein interaction annotation workflow. It was structured into four distinct subtasks related to: (a) detection of protein interaction-relevant articles; (b) extraction and normalization of protein interaction pairs; (c) retrieval of the interaction detection methods used; and (d) retrieval of actual text passages that provide evidence for protein interactions. A total of 26 teams submitted runs for at least one of the proposed subtasks. In the interaction article detection subtask, the top scoring team reached an F-score of 0.78. In the interaction pair extraction and mapping to SwissProt, a precision of 0.37 (with recall of 0.33) was obtained. For associating articles with an experimental interaction detection method, an F-score of 0.65 was achieved. As for the retrieval of the PPI passages best summarizing a given protein interaction in full-text articles, 19% of the submissions returned by one of the runs corresponded to curator-selected sentences. Curators extracted only the passages that best summarized a given interaction, implying that many of the automatically extracted ones could contain interaction information but did not correspond to the most informative sentences. CONCLUSION: The BioCreative II PPI task is the first attempt to compare the performance of text-mining tools specific for each of the basic steps of the PPI extraction pipeline. The challenges identified range from problems in full-text format conversion of articles to difficulties in detecting interactor protein pairs and then linking them to their database records. Some limitations were also encountered when using a single (and possibly incomplete) reference database for protein normalization or when limiting search for interactor proteins to co-occurrence within a single sentence, when a mention might span neighboring sentences. Finally, distinguishing between novel, experimentally verified interactions (annotation relevant) and previously known interactions adds additional complexity to these tasks.

Asunto(s)

Biología Computacional/métodos , Mapeo de Interacción de Proteínas/métodos , Sociedades Científicas , Animales , Humanos , Ratones

Introducing meta-services for biomedical information extraction.

Leitner, Florian; Krallinger, Martin; Rodriguez-Penagos, Carlos; Hakenberg, Jörg; Plake, Conrad; Kuo, Cheng-Ju; Hsu, Chun-Nan; Tsai, Richard Tzong-Han; Hung, Hsi-Chuan; Lau, William W; Johnson, Calvin A; Saetre, Rune; Yoshida, Kazuhiro; Chen, Yan Hua; Kim, Sun; Shin, Soo-Yong; Zhang, Byoung-Tak; Baumgartner, William A; Hunter, Lawrence; Haddow, Barry; Matthews, Michael; Wang, Xinglong; Ruch, Patrick; Ehrler, Frédéric; Ozgür, Arzucan; Erkan, Günes; Radev, Dragomir R; Krauthammer, Michael; Luong, ThaiBinh; Hoffmann, Robert; Sander, Chris; Valencia, Alfonso.

Genome Biol ; 9 Suppl 2: S6, 2008.

Artículo en Inglés | MEDLINE | ID: mdl-18834497

RESUMEN

We introduce the first meta-service for information extraction in molecular biology, the BioCreative MetaServer (BCMS; http://bcms.bioinfo.cnio.es/). This prototype platform is a joint effort of 13 research groups and provides automatically generated annotations for PubMed/Medline abstracts. Annotation types cover gene names, gene IDs, species, and protein-protein interactions. The annotations are distributed by the meta-server in both human and machine readable formats (HTML/XML). This service is intended to be used by biomedical researchers and database annotators, and in biomedical language processing. The platform allows direct comparison, unified access, and result aggregation of the annotations.

Asunto(s)

Investigación Biomédica/métodos , Biología Computacional/métodos , Almacenamiento y Recuperación de la Información , Internet , Humanos

RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation.

Gama-Castro, Socorro; Jiménez-Jacinto, Verónica; Peralta-Gil, Martín; Santos-Zavaleta, Alberto; Peñaloza-Spinola, Mónica I; Contreras-Moreira, Bruno; Segura-Salazar, Juan; Muñiz-Rascado, Luis; Martínez-Flores, Irma; Salgado, Heladia; Bonavides-Martínez, César; Abreu-Goodger, Cei; Rodríguez-Penagos, Carlos; Miranda-Ríos, Juan; Morett, Enrique; Merino, Enrique; Huerta, Araceli M; Treviño-Quintanilla, Luis; Collado-Vides, Julio.

Nucleic Acids Res ; 36(Database issue): D120-4, 2008 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-18158297

RESUMEN

RegulonDB (http://regulondb.ccg.unam.mx/) is the primary reference database offering curated knowledge of the transcriptional regulatory network of Escherichia coli K12, currently the best-known electronically encoded database of the genetic regulatory network of any free-living organism. This paper summarizes the improvements, new biology and new features available in version 6.0. Curation of original literature is, from now on, up to date for every new release. All the objects are supported by their corresponding evidences, now classified as strong or weak. Transcription factors are classified by origin of their effectors and by gene ontology class. We have now computational predictions for sigma(54) and five different promoter types of the sigma(70) family, as well as their corresponding -10 and -35 boxes. In addition to those curated from the literature, we added about 300 experimentally mapped promoters coming from our own high-throughput mapping efforts. RegulonDB v.6.0 now expands beyond transcription initiation, including RNA regulatory elements, specifically riboswitches, attenuators and small RNAs, with their known associated targets. The data can be accessed through overviews of correlations about gene regulation. RegulonDB associated original literature, together with more than 4000 curation notes, can now be searched with the Textpresso text mining engine.

Asunto(s)

Bases de Datos Genéticas , Escherichia coli K12/genética , Regulación Bacteriana de la Expresión Génica , Redes Reguladoras de Genes , Biología Computacional , Internet , Modelos Genéticos , Regiones Promotoras Genéticas , Secuencias Reguladoras de Ácido Ribonucleico , Regulón , Factor sigma/metabolismo , Programas Informáticos , Factores de Transcripción/metabolismo , Sitio de Iniciación de la Transcripción , Transcripción Genética

Automatic reconstruction of a bacterial regulatory network using Natural Language Processing.

Rodríguez-Penagos, Carlos; Salgado, Heladia; Martínez-Flores, Irma; Collado-Vides, Julio.

BMC Bioinformatics ; 8: 293, 2007 Aug 07.

Artículo en Inglés | MEDLINE | ID: mdl-17683642

RESUMEN

BACKGROUND: Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers. Our major aim is to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases. We implemented a rule-based system to generate networks from different sets of documents dealing with regulation in Escherichia coli K-12. RESULTS: Performance evaluation is based on the most comprehensive transcriptional regulation database for any organism, the manually-curated RegulonDB, 45% of which we were able to recreate automatically. From our automated analysis we were also able to find some new interactions from papers not already curated, or that were missed in the manual filtering and review of the literature. We also put forward a novel Regulatory Interaction Markup Language better suited than SBML for simultaneously representing data of interest for biologists and text miners. CONCLUSION: Manual curation of the output of automatic processing of text is a good way to complement a more detailed review of the literature, either for validating the results of what has been already annotated, or for discovering facts and information that might have been overlooked at the triage or curation stages.

Asunto(s)

Escherichia coli K12/metabolismo , Proteínas de Escherichia coli/metabolismo , Regulación de la Expresión Génica/fisiología , Modelos Biológicos , Procesamiento de Lenguaje Natural , Publicaciones Periódicas como Asunto , Transducción de Señal/fisiología , Indización y Redacción de Resúmenes/métodos , Inteligencia Artificial , Simulación por Computador , Almacenamiento y Recuperación de la Información/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA