Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Más filtros

Banco de datos
Tipo de estudio
País/Región como asunto
Tipo del documento
Asunto de la revista
Intervalo de año de publicación
1.
BMC Med Inform Decis Mak ; 15 Suppl 1: S9, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26043907

RESUMEN

BACKGROUND: Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques. METHODS: This paper presents a hybrid dictionary-based bio-entity extraction technique. The approach expands the bio-entity dictionary by combining different data sources and improves the recall rate through the shortest path edit distance algorithm. In addition, the proposed technique adopts text mining techniques in the merging stage of similar entities such as Part of Speech (POS) expansion, stemming, and the exploitation of the contextual cues to further improve the performance. RESULTS: The experimental results show that the proposed technique achieves the best or at least equivalent performance among compared techniques, GENIA, MESH, UMLS, and combinations of these three resources in F-measure. CONCLUSIONS: The results imply that the performance of dictionary-based extraction techniques is largely influenced by information resources used to build the dictionary. In addition, the edit distance algorithm shows steady performance with three different dictionaries in precision whereas the context-only technique achieves a high-end performance with three difference dictionaries in recall.


Asunto(s)
Minería de Datos/métodos , Informática Médica/métodos , Vocabulario Controlado
2.
BMC Bioinformatics ; 12 Suppl 12: S4, 2011 Nov 24.
Artículo en Inglés | MEDLINE | ID: mdl-22168401

RESUMEN

BACKGROUND: Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. METHODS: We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. RESULTS: By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. CONCLUSIONS: Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.


Asunto(s)
Inteligencia Artificial , Procesamiento de Lenguaje Natural , Proteínas/metabolismo , Humanos , MEDLINE , Mapas de Interacción de Proteínas , Proteínas/química , Semántica , Estados Unidos
3.
BMC Bioinformatics ; 12 Suppl 2: S6, 2011 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-21489225

RESUMEN

BACKGROUND: As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. RESULTS: In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. CONCLUSIONS: Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Factuales , Almacenamiento y Recuperación de la Información/métodos , Programas Informáticos , Algoritmos , Internet , Bases del Conocimiento , Lenguajes de Programación , Semántica
4.
BMC Bioinformatics ; 11 Suppl 2: S6, 2010 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-20406504

RESUMEN

BACKGROUND: Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. RESULTS: RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. CONCLUSIONS: RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user's feedback and efficiently processes the function to return relevant articles in real time.


Asunto(s)
Algoritmos , Inteligencia Artificial , Biología Computacional/métodos , Sistemas de Administración de Bases de Datos , PubMed , Interpretación Estadística de Datos , Retroalimentación , Reproducibilidad de los Resultados , Interfaz Usuario-Computador
5.
Int J Data Min Bioinform ; 11(2): 205-22, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26255383

RESUMEN

The Resource Description Framework (RDF) is widely used for sharing biomedical data, such as gene ontology or the online protein database UniProt. SPARQL is a native query language for RDF, featuring regular expressions in queries for which exact values are either irrelevant or unknown. The use of regular expression indexes in SPARQL query processing improves the performance of queries containing regular expressions by up to two orders of magnitude. In this study, we address the update operation for regular expression indexes in RDF databases. We identify major performance problems of straightforward index update algorithms and propose a new algorithm that utilises unique properties of regular expression indexes to increase performance. Our contributions can be summarised as follows: (1) we propose an efficient update algorithm for regular expression indexes in RDF databases, (2) we build a prototype system for the proposed algorithm in C++ and (3) we conduct extensive experiments demonstrating the improvement of our algorithm over the straightforward approaches by an order of magnitude.


Asunto(s)
Algoritmos , Ontologías Biológicas , Sistemas de Administración de Bases de Datos , Bases de Datos Factuales , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Lenguajes de Programación , Semántica , Terminología como Asunto
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA