Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Más filtros




Base de datos
Intervalo de año de publicación
1.
PLoS One ; 9(9): e107187, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25192339

RESUMEN

The information of the Gene Ontology annotation is helpful in the explanation of life science phenomena, and can provide great support for the research of the biomedical field. The use of the Gene Ontology is gradually affecting the way people store and understand bioinformatic data. To facilitate the prediction of gene functions with the aid of text mining methods and existing resources, we transform it into a multi-label top-down classification problem and develop a method that uses the hierarchical relationships in the Gene Ontology structure to relieve the quantitative imbalance of positive and negative training samples. Meanwhile the method enhances the discriminating ability of classifiers by retaining and highlighting the key training samples. Additionally, the top-down classifier based on a tree structure takes the relationship of target classes into consideration and thus solves the incompatibility between the classification results and the Gene Ontology structure. Our experiment on the Gene Ontology annotation corpus achieves an F-value performance of 50.7% (precision: 52.7% recall: 48.9%). The experimental results demonstrate that when the size of training set is small, it can be expanded via topological propagation of associated documents between the parent and child nodes in the tree structure. The top-down classification model applies to the set of texts in an ontology structure or with a hierarchical relationship.


Asunto(s)
Biología Computacional/métodos , Ontología de Genes , Genes/fisiología , Algoritmos , Animales , Redes Reguladoras de Genes , Humanos , Modelos Teóricos , Anotación de Secuencia Molecular/métodos , Datos de Secuencia Molecular , Relación Estructura-Actividad
2.
IEEE Trans Nanobioscience ; 12(3): 173-81, 2013 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-23974658

RESUMEN

Protein-protein interactions (PPIs) play a key role in various aspects of the structural and functional organization of the cell. Knowledge about them unveils the molecular mechanisms of biological processes. However, the amount of biomedical literature regarding protein interactions is increasing rapidly and it is difficult for interaction database curators to detect and curate protein interaction information manually. In this paper, we present a PPI extraction system, termed PPIExtractor, which automatically extracts PPIs from biomedical text and visualizes them. Given a Medline record dataset, PPIExtractor first applies Feature Coupling Generalization (FCG) to tag protein names in text, next uses the extended semantic similarity-based method to normalize them, then combines feature-based, convolution tree and graph kernels to extract PPIs, and finally visualizes the PPI network. Experimental evaluations show that PPIExtractor can achieve state-of-the-art performance on a DIP subset with respect to comparable evaluations.


Asunto(s)
Biología Computacional/métodos , Minería de Datos/métodos , Bases de Datos de Proteínas , Mapas de Interacción de Proteínas , Algoritmos , Proteínas/química , Proteínas/metabolismo , Interfaz Usuario-Computador
3.
PLoS One ; 7(9): e43558, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22984434

RESUMEN

The recognition and normalization of gene mentions in biomedical literature are crucial steps in biomedical text mining. We present a system for extracting gene names from biomedical literature and normalizing them to gene identifiers in databases. The system consists of four major components: gene name recognition, entity mapping, disambiguation and filtering. The first component is a gene name recognizer based on dictionary matching and semi-supervised learning, which utilizes the co-occurrence information of a large amount of unlabeled MEDLINE abstracts to enhance feature representation of gene named entities. In the stage of entity mapping, we combine the strategies of exact match and approximate match to establish linkage between gene names in the context and the EntrezGene database. For the gene names that map to more than one database identifiers, we develop a disambiguation method based on semantic similarity derived from the Gene Ontology and MEDLINE abstracts. To remove the noise produced in the previous steps, we design a filtering method based on the confidence scores in the dictionary used for NER. The system is able to adjust the trade-off between precision and recall based on the result of filtering. It achieves an F-measure of 83% (precision: 82.5% recall: 83.5%) on BioCreative II Gene Normalization (GN) dataset, which is comparable to the current state-of-the-art.


Asunto(s)
Genes , Terminología como Asunto , Bases de Datos como Asunto , Diccionarios como Asunto , Anotación de Secuencia Molecular , Estándares de Referencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA