Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Más filtros

Banco de datos
Tipo del documento
Asunto de la revista
Intervalo de año de publicación
1.
Bioinformatics ; 30(23): 3365-71, 2014 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-25143286

RESUMEN

MOTIVATION: Knowledge of drug-drug interactions (DDIs) is crucial for health-care professionals to avoid adverse effects when co-administering drugs to patients. As most newly discovered DDIs are made available through scientific publications, automatic DDI extraction is highly relevant. RESULTS: We propose a novel feature-based approach to extract DDIs from text. Our approach consists of three steps. First, we apply text preprocessing to convert input sentences from a given dataset into structured representations. Second, we map each candidate DDI pair from that dataset into a suitable syntactic structure. Based on that, a novel set of features is used to generate feature vectors for these candidate DDI pairs. Third, the obtained feature vectors are used to train a support vector machine (SVM) classifier. When evaluated on two DDI extraction challenge test datasets from 2011 and 2013, our system achieves F-scores of 71.1% and 83.5%, respectively, outperforming any state-of-the-art DDI extraction system. AVAILABILITY AND IMPLEMENTATION: The source code is available for academic use at http://www.biosemantics.org/uploads/DDI.zip.


Asunto(s)
Minería de Datos/métodos , Interacciones Farmacológicas , Humanos , Máquina de Vectores de Soporte
2.
Bioinformatics ; 28(20): 2654-61, 2012 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-22859502

RESUMEN

MOTIVATION: The abundance of biomedical literature has attracted significant interest in novel methods to automatically extract biomedical relations from the literature. Until recently, most research was focused on extracting binary relations such as protein-protein interactions and drug-disease relations. However, these binary relations cannot fully represent the original biomedical data. Therefore, there is a need for methods that can extract fine-grained and complex relations known as biomedical events. RESULTS: In this article we propose a novel method to extract biomedical events from text. Our method consists of two phases. In the first phase, training data are mapped into structured representations. Based on that, templates are used to extract rules automatically. In the second phase, extraction methods are developed to process the obtained rules. When evaluated against the Genia event extraction abstract and full-text test datasets (Task 1), we obtain results with F-scores of 52.34 and 53.34, respectively, which are comparable to the state-of-the-art systems. Furthermore, our system achieves superior performance in terms of computational efficiency. AVAILABILITY: Our source code is available for academic use at http://dl.dropbox.com/u/10256952/BioEvent.zip.


Asunto(s)
Minería de Datos/métodos , Algoritmos , Mapeo de Interacción de Proteínas
3.
Bioinformatics ; 27(2): 259-65, 2011 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-21062765

RESUMEN

MOTIVATION: Protein-protein interactions (PPIs) play an important role in understanding biological processes. Although recent research in text mining has achieved a significant progress in automatic PPI extraction from literature, performance of existing systems still needs to be improved. RESULTS: In this study, we propose a novel algorithm for extracting PPIs from literature which consists of two phases. First, we automatically categorize the data into subsets based on its semantic properties and extract candidate PPI pairs from these subsets. Second, we apply support vector machines (SVMs) to classify candidate PPI pairs using features specific for each subset. We obtain promising results on five benchmark datasets: AIMed, BioInfer, HPRD50, IEPA and LLL with F-scores ranging from 60% to 84%, which are comparable with the state-of-the-art PPI extraction systems. Furthermore, our system achieves the best performance on cross-corpora evaluation and comparative performance in terms of computational efficiency. AVAILABILITY: The source code and scripts used in this article are available for academic use at http://staff.science.uva.nl/~bui/PPIs.zip CONTACT: bqchinh@gmail.com.


Asunto(s)
Minería de Datos/métodos , Mapeo de Interacción de Proteínas , Algoritmos , Animales , Inteligencia Artificial
4.
BMC Bioinformatics ; 11: 101, 2010 Feb 23.
Artículo en Inglés | MEDLINE | ID: mdl-20178611

RESUMEN

BACKGROUND: In HIV treatment it is critical to have up-to-date resistance data of applicable drugs since HIV has a very high rate of mutation. These data are made available through scientific publications and must be extracted manually by experts in order to be used by virologists and medical doctors. Therefore there is an urgent need for a tool that partially automates this process and is able to retrieve relations between drugs and virus mutations from literature. RESULTS: In this work we present a novel method to extract and combine relationships between HIV drugs and mutations in viral genomes. Our extraction method is based on natural language processing (NLP) which produces grammatical relations and applies a set of rules to these relations. We applied our method to a relevant set of PubMed abstracts and obtained 2,434 extracted relations with an estimated performance of 84% for F-score. We then combined the extracted relations using logistic regression to generate resistance values for each pair. The results of this relation combination show more than 85% agreement with the Stanford HIVDB for the ten most frequently occurring mutations. The system is used in 5 hospitals from the Virolab project http://www.virolab.org to preselect the most relevant novel resistance data from literature and present those to virologists and medical doctors for further evaluation. CONCLUSIONS: The proposed relation extraction and combination method has a good performance on extracting HIV drug resistance data. It can be used in large-scale relation extraction experiments. The developed methods can also be applied to extract other type of relations such as gene-protein, gene-disease, and disease-mutation.


Asunto(s)
Indización y Redacción de Resúmenes/métodos , Farmacorresistencia Viral/genética , Infecciones por VIH/tratamiento farmacológico , Fármacos Anti-VIH/uso terapéutico , Bases de Datos Genéticas , Genoma Viral , Infecciones por VIH/genética , Almacenamiento y Recuperación de la Información/métodos , Internet , Mutación , PubMed
5.
Source Code Biol Med ; 9(1): 1, 2014 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-24401704

RESUMEN

BACKGROUND: Cellular events play a central role in the understanding of biological processes and functions, providing insight on both physiological and pathogenesis mechanisms. Automatic extraction of mentions of such events from the literature represents an important contribution to the progress of the biomedical domain, allowing faster updating of existing knowledge. The identification of trigger words indicating an event is a very important step in the event extraction pipeline, since the following task(s) rely on its output. This step presents various complex and unsolved challenges, namely the selection of informative features, the representation of the textual context, and the selection of a specific event type for a trigger word given this context. RESULTS: We propose TrigNER, a machine learning-based solution for biomedical event trigger recognition, which takes advantage of Conditional Random Fields (CRFs) with a high-end feature set, including linguistic-based, orthographic, morphological, local context and dependency parsing features. Additionally, a completely configurable algorithm is used to automatically optimize the feature set and training parameters for each event type. Thus, it automatically selects the features that have a positive contribution and automatically optimizes the CRF model order, n-grams sizes, vertex information and maximum hops for dependency parsing features. The final output consists of various CRF models, each one optimized to the linguistic characteristics of each event type. CONCLUSIONS: TrigNER was tested in the BioNLP 2009 shared task corpus, achieving a total F-measure of 62.7 and outperforming existing solutions on various event trigger types, namely gene expression, transcription, protein catabolism, phosphorylation and binding. The proposed solution allows researchers to easily apply complex and optimized techniques in the recognition of biomedical event triggers, making its application a simple routine task. We believe this work is an important contribution to the biomedical text mining community, contributing to improved and faster event recognition on scientific articles, and consequent hypothesis generation and knowledge discovery. This solution is freely available as open source at http://bioinformatics.ua.pt/trigner.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA