Búsqueda | Portal Regional de la BVS

Genome-Wide Scale-Free Network Inference for Candida albicans.

Altwasser, Robert; Linde, Jörg; Buyko, Ekaterina; Hahn, Udo; Guthke, Reinhard.

Front Microbiol ; 3: 51, 2012.

Artículo en Inglés | MEDLINE | ID: mdl-22355294

RESUMEN

Discovery of essential genes in pathogenic organisms is an important step in the development of new medication. Despite a growing number of genome data available, little is known about C. albicans, a major fungal pathogen. Most of the human population carries C. albicans as commensal, but it can cause systemic infection that may lead to the death of the host if the immune system has deteriorated. In many organisms central nodes in the interaction network (hubs) play a crucial role for information and energy transport. Knock-outs of such hubs often lead to lethal phenotypes making them interesting drug targets. To identify these central genes via topological analysis, we inferred gene regulatory networks that are sparse and scale-free. We collected information from various sources to complement the limited expression data available. We utilized a linear regression algorithm to infer genome-wide gene regulatory interaction networks. To evaluate the predictive power of our approach, we used an automated text-mining system that scanned full-text research papers for known interactions. With the help of the compendium of known interactions, we also optimize the influence of the prior knowledge and the sparseness of the model to achieve the best results. We compare the results of our approach with those of other state-of-the-art network inference methods and show that we outperform those methods. Finally we identify a number of hubs in the genome of the fungus and investigate their biological relevance.

Active Learning-based corpus annotation--the PathoJen experience.

Hahn, Udo; Beisswanger, Elena; Buyko, Ekaterina; Faessler, Erik.

AMIA Annu Symp Proc ; 2012: 301-10, 2012.

Artículo en Inglés | MEDLINE | ID: mdl-23304300

RESUMEN

We report on basic design decisions and novel annotation procedures underlying the development of PathoJen, a corpus of Medline abstracts annotated for pathological phenomena, including diseases as a proper subclass. This named entity type is known to be hard to delineate and capture by annotation guidelines. We here propose a two-category encoding schema where we distinguish short from long mention spans, the first covering standardized terminology (e.g. diseases), the latter accounting for less structured descriptive statements about norm-deviant states, as well as criteria and observations that might signal pathologies. The second design decision relates to the way annotation instances are sampled. Here we subscribe to an Active Learning-based approach which is known to save annotation costs without sacrificing annotation quality by means of a sample bias. By design, Active Learning picks up 'hard' to annotate instances for human annotators, whereas 'easier' ones are passed over to the automatic classifier whose models already incorporate and gradually improve with previous annotation experience.

Asunto(s)

Algoritmos , Inteligencia Artificial , Patología/clasificación , Aprendizaje Basado en Problemas , Humanos , MEDLINE

The extraction of pharmacogenetic and pharmacogenomic relations--a case study using PharmGKB.

Buyko, Ekaterina; Beisswanger, Elena; Hahn, Udo.

Pac Symp Biocomput ; : 376-87, 2012.

Artículo en Inglés | MEDLINE | ID: mdl-22174293

RESUMEN

In this paper, we report on adapting the JREX relation extraction engine, originally developed For the elicitation of protein-protein interaction relations, to the domains of pharmacogenetics and pharmacogenomics. We propose an intrinsic and an extrinsic evaluation scenario which is based on knowledge contained in the PharmGKB knowledge base. Porting JREX yields favorable results in the range of 80% F-score for Gene-Disease, Gene-Drug, and Drug-Disease relations.

Asunto(s)

Bases del Conocimiento , Farmacogenética/estadística & datos numéricos , Hidrocarburo de Aril Hidroxilasas/genética , Hidrocarburo de Aril Hidroxilasas/metabolismo , Neoplasias de la Mama/tratamiento farmacológico , Neoplasias de la Mama/genética , Citalopram/farmacocinética , Biología Computacional , Citocromo P-450 CYP2C19 , Bases de Datos Factuales , Bases de Datos Genéticas , Docetaxel , Femenino , Genes BRCA2 , Predisposición Genética a la Enfermedad , Humanos , Obesidad/genética , Farmacogenética/normas , Farmacocinética , Mapeo de Interacción de Proteínas/estadística & datos numéricos , Taxoides/uso terapéutico , Urocortinas/genética

U-Compare bio-event meta-service: compatible BioNLP event extraction services.

Kano, Yoshinobu; Björne, Jari; Ginter, Filip; Salakoski, Tapio; Buyko, Ekaterina; Hahn, Udo; Cohen, K Bretonnel; Verspoor, Karin; Roeder, Christophe; Hunter, Lawrence E; Kilicoglu, Halil; Bergler, Sabine; Van Landeghem, Sofie; Van Parys, Thomas; Van de Peer, Yves; Miwa, Makoto; Ananiadou, Sophia; Neves, Mariana; Pascual-Montano, Alberto; Özgür, Arzucan; Radev, Dragomir R; Riedel, Sebastian; Sætre, Rune; Chun, Hong-Woo; Kim, Jin-Dong; Pyysalo, Sampo; Ohta, Tomoko; Tsujii, Jun'ichi.

BMC Bioinformatics ; 12: 481, 2011 Dec 18.

Artículo en Inglés | MEDLINE | ID: mdl-22177292

RESUMEN

BACKGROUND: Bio-molecular event extraction from literature is recognized as an important task of bio text mining and, as such, many relevant systems have been developed and made available during the last decade. While such systems provide useful services individually, there is a need for a meta-service to enable comparison and ensemble of such services, offering optimal solutions for various purposes. RESULTS: We have integrated nine event extraction systems in the U-Compare framework, making them intercompatible and interoperable with other U-Compare components. The U-Compare event meta-service provides various meta-level features for comparison and ensemble of multiple event extraction systems. Experimental results show that the performance improvements achieved by the ensemble are significant. CONCLUSIONS: While individual event extraction systems themselves provide useful features for bio text mining, the U-Compare meta-service is expected to improve the accessibility to the individual systems, and to enable meta-level uses over multiple event extraction systems such as comparison and ensemble.

Asunto(s)

Minería de Datos , Sistemas de Computación , Publicaciones Periódicas como Asunto , Programas Informáticos

Assessment of NER solutions against the first and second CALBC Silver Standard Corpus.

Rebholz-Schuhmann, Dietrich; Jimeno Yepes, Antonio; Li, Chen; Kafkas, Senay; Lewin, Ian; Kang, Ning; Corbett, Peter; Milward, David; Buyko, Ekaterina; Beisswanger, Elena; Hornbostel, Kerstin; Kouznetsov, Alexandre; Witte, René; Laurila, Jonas B; Baker, Christopher Jo; Kuo, Cheng-Ju; Clematide, Simone; Rinaldi, Fabio; Farkas, Richárd; Móra, György; Hara, Kazuo; Furlong, Laura I; Rautschka, Michael; Neves, Mariana Lara; Pascual-Montano, Alberto; Wei, Qi; Collier, Nigel; Chowdhury, Md Faisal Mahbub; Lavelli, Alberto; Berlanga, Rafael; Morante, Roser; Van Asch, Vincent; Daelemans, Walter; Marina, José Luís; van Mulligen, Erik; Kors, Jan; Hahn, Udo.

J Biomed Semantics ; 2 Suppl 5: S11, 2011 Oct 06.

Artículo en Inglés | MEDLINE | ID: mdl-22166494

RESUMEN

BACKGROUND: Competitions in text mining have been used to measure the performance of automatic text processing solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is time-consuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic groups. To overcome these shortcomings, the CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus (SSC-I). The four semantic groups are chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). This corpus has been used for the First CALBC Challenge asking the participants to annotate the corpus with their text processing solutions. RESULTS: All four PPs from the CALBC project and in addition, 12 challenge participants (CPs) contributed annotated data sets for an evaluation against the SSC-I. CPs could ignore the training data and deliver the annotations from their genuine annotation system, or could train a machine-learning approach on the provided pre-annotated data. In general, the performances of the annotation solutions were lower for entities from the categories CHED and PRGE in comparison to the identification of entities categorized as DISO and SPE. The best performance over all semantic groups were achieved from two annotation solutions that have been trained on the SSC-I.The data sets from participants were used to generate the harmonised Silver Standard Corpus II (SSC-II), if the participant did not make use of the annotated data set from the SSC-I for training purposes. The performances of the participants' solutions were again measured against the SSC-II. The performances of the annotation solutions showed again better results for DISO and SPE in comparison to CHED and PRGE. CONCLUSIONS: The SSC-I delivers a large set of annotations (1,121,705) for a large number of documents (100,000 Medline abstracts). The annotations cover four different semantic groups and are sufficiently homogeneous to be reproduced with a trained classifier leading to an average F-measure of 85%. Benchmarking the annotation solutions against the SSC-II leads to better performance for the CPs' annotation solutions in comparison to the SSC-I.

CALBC silver standard corpus.

Rebholz-Schuhmann, Dietrich; Jimeno Yepes, Antonio José; Van Mulligen, Erik M; Kang, Ning; Kors, Jan; Milward, David; Corbett, Peter; Buyko, Ekaterina; Beisswanger, Elena; Hahn, Udo.

J Bioinform Comput Biol ; 8(1): 163-79, 2010 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-20183881

RESUMEN

The CALBC initiative aims to provide a large-scale biomedical text corpus that contains semantic annotations for named entities of different kinds. The generation of this corpus requires that the annotations from different automatic annotation systems be harmonized. In the first phase, the annotation systems from five participants (EMBL-EBI, EMC Rotterdam, NLM, JULIE Lab Jena, and Linguamatics) were gathered. All annotations were delivered in a common annotation format that included concept identifiers in the boundary assignments and that enabled comparison and alignment of the results. During the harmonization phase, the results produced from those different systems were integrated in a single harmonized corpus ("silver standard" corpus) by applying a voting scheme. We give an overview of the processed data and the principles of harmonization--formal boundary reconciliation and semantic matching of named entities. Finally, all submissions of the participants were evaluated against that silver standard corpus. We found that species and disease annotations are better standardized amongst the partners than the annotations of genes and proteins. The raw corpus is now available for additional named entity annotations. Parts of it will be made available later on for a public challenge. We expect that we can improve corpus building activities both in terms of the numbers of named entity classes being covered, as well as the size of the corpus in terms of annotated documents.

Asunto(s)

Biología Computacional/normas , Minería de Datos/normas , Conducta Cooperativa , Minería de Datos/estadística & datos numéricos , Bases de Datos Factuales/estadística & datos numéricos , Unified Medical Language System

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA