Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
J Biomed Inform ; 87: 79-87, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30296491

RESUMEN

This paper proposes an effective and robust approach for Chemical-Induced Disease (CID) relation extraction from PubMed articles. The study was performed on the Chemical Disease Relation (CDR) task of BioCreative V track-3 corpus. The proposed system, named relSCAN, is an efficient CID relation extraction system with two phases to classify relation instances from the Co-occurrence and Non-Co-occurrence mention levels. We describe the case of chemical and disease mentions that occur in the same sentence as 'Co-occurrence', or as 'Non-Co-occurrence' otherwise. In the first phase, the relation instances are constructed on both mention levels. In the second phase, we employ a hybrid feature set to classify the relation instances at both of these mention levels using the combination of two Machine Learning (ML) classifiers (Support Vector Machine (SVM) and J48 Decision tree). This system is entirely corpus dependent and does not rely on information from external resources in order to boost its performance. We achieved good results, which are comparable with the other state-of-the-art CID relation extraction systems on the BioCreative V corpus. Furthermore, our system achieves the best performance on the Non-Co-occurrence mention level.


Asunto(s)
Trastornos Químicamente Inducidos/diagnóstico , Minería de Datos/métodos , Informática Médica/métodos , Máquina de Vectores de Soporte , Algoritmos , Toma de Decisiones , Enfermedad , Humanos , Aprendizaje Automático , Publicaciones , Distribución Aleatoria , Análisis de Regresión
2.
Biomed Res Int ; 2016: 4248026, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26942193

RESUMEN

Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities.


Asunto(s)
Algoritmos , Fenómenos Químicos , Minería de Datos/métodos , Procesamiento de Lenguaje Natural , Aprendizaje Automático , Programas Informáticos , Máquina de Vectores de Soporte , Terminología como Asunto
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA