Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
J Bioinform Comput Biol ; 7(2): 269-85, 2009 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-19340915

RESUMEN

In the past decade, many automated prediction methods for the subcellular localization of proteins have been proposed, utilizing a wide range of principles and learning approaches. Based on an experimental evaluation of different methods and their theoretical properties, we propose to combine a well-balanced set of existing approaches to new, ensemble-based prediction methods. The experimental evaluation shows that our ensembles improve substantially over the underlying base methods.


Asunto(s)
Algoritmos , Proteínas/química , Proteínas/metabolismo , Análisis de Secuencia de Proteína/métodos , Fracciones Subcelulares/química , Fracciones Subcelulares/metabolismo , Secuencia de Aminoácidos , Datos de Secuencia Molecular , Relación Estructura-Actividad
2.
BMC Bioinformatics ; 9: 207, 2008 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-18433469

RESUMEN

BACKGROUND: The increasing amount of published literature in biomedicine represents an immense source of knowledge, which can only efficiently be accessed by a new generation of automated information extraction tools. Named entity recognition of well-defined objects, such as genes or proteins, has achieved a sufficient level of maturity such that it can form the basis for the next step: the extraction of relations that exist between the recognized entities. Whereas most early work focused on the mere detection of relations, the classification of the type of relation is also of great importance and this is the focus of this work. In this paper we describe an approach that extracts both the existence of a relation and its type. Our work is based on Conditional Random Fields, which have been applied with much success to the task of named entity recognition. RESULTS: We benchmark our approach on two different tasks. The first task is the identification of semantic relations between diseases and treatments. The available data set consists of manually annotated PubMed abstracts. The second task is the identification of relations between genes and diseases from a set of concise phrases, so-called GeneRIF (Gene Reference Into Function) phrases. In our experimental setting, we do not assume that the entities are given, as is often the case in previous relation extraction work. Rather the extraction of the entities is solved as a subproblem. Compared with other state-of-the-art approaches, we achieve very competitive results on both data sets. To demonstrate the scalability of our solution, we apply our approach to the complete human GeneRIF database. The resulting gene-disease network contains 34758 semantic associations between 4939 genes and 1745 diseases. The gene-disease network is publicly available as a machine-readable RDF graph. CONCLUSION: We extend the framework of Conditional Random Fields towards the annotation of semantic relations from text and apply it to the biomedical domain. Our approach is based on a rich set of textual features and achieves a performance that is competitive to leading approaches. The model is quite general and can be extended to handle arbitrary biological entities and relation types. The resulting gene-disease network shows that the GeneRIF database provides a rich knowledge source for text mining. Current work is focused on improving the accuracy of detection of entities as well as entity boundaries, which will also greatly improve the relation extraction performance.


Asunto(s)
Sistemas de Administración de Bases de Datos , Procesamiento de Lenguaje Natural , Investigación Biomédica/métodos , Sistemas de Administración de Bases de Datos/normas , Sistemas de Administración de Bases de Datos/estadística & datos numéricos , Bases de Datos Genéticas , Enfermedad/clasificación , Enfermedad/etiología , Genes/fisiología , Humanos , MEDLINE , Modelos Estadísticos , Semántica , Análisis de Secuencia , Terminología como Asunto , Terapéutica/clasificación , Vocabulario Controlado
3.
Bioinformatics ; 22(14): e49-57, 2006 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-16873512

RESUMEN

MOTIVATION: Many problems in data integration in bioinformatics can be posed as one common question: Are two sets of observations generated by the same distribution? We propose a kernel-based statistical test for this problem, based on the fact that two distributions are different if and only if there exists at least one function having different expectation on the two distributions. Consequently we use the maximum discrepancy between function means as the basis of a test statistic. The Maximum Mean Discrepancy (MMD) can take advantage of the kernel trick, which allows us to apply it not only to vectors, but strings, sequences, graphs, and other common structured data types arising in molecular biology. RESULTS: We study the practical feasibility of an MMD-based test on three central data integration tasks: Testing cross-platform comparability of microarray data, cancer diagnosis, and data-content based schema matching for two different protein function classification schemas. In all of these experiments, including high-dimensional ones, MMD is very accurate in finding samples that were generated from the same distribution, and outperforms its best competitors. CONCLUSIONS: We have defined a novel statistical test of whether two samples are from the same distribution, compatible with both multivariate and structured data, that is fast, easy to implement, and works well, as confirmed by our experiments. AVAILABILITY: http://www.dbs.ifi.lmu.de/~borgward/MMD.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Interpretación Estadística de Datos , Bases de Datos Factuales , Almacenamiento y Recuperación de la Información/métodos , Modelos Biológicos , Modelos Estadísticos , Simulación por Computador , Tamaño de la Muestra , Distribuciones Estadísticas , Integración de Sistemas
4.
Med Image Comput Comput Assist Interv ; 14(Pt 2): 607-14, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21995079

RESUMEN

Registering CT scans in a body atlas is an important technique for aligning and comparing different CT scans. It is also required for navigating automatically to certain regions of a scan or if sub volumes should be identified automatically. Common solutions to this problem employ landmark detectors and interpolation techniques. However, these solutions are often not applicable if the query scan is very small or consists only of a single slice. Therefore, the research community proposed methods being independent from landmark detectors which are using imaging techniques to register the slices in a generalized height scale. In this paper, we propose an improved prediction method for registering single slices. Our solution is based on specialized image descriptors and instance-based learning. The experimental evaluation shows that the new method improves accuracy and stability of comparable registration methods by using only a single CT slice is required for the registration.


Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Tomografía Computarizada por Rayos X/métodos , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Algoritmos , Niño , Preescolar , Femenino , Humanos , Masculino , Persona de Mediana Edad , Modelos Estadísticos , Cuello/patología , Radiografía Torácica/métodos , Programas Informáticos
5.
Pac Symp Biocomput ; : 4-15, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17992741

RESUMEN

It is widely believed that comparing discrepancies in the protein-protein interaction (PPI) networks of individuals will become an important tool in understanding and preventing diseases. Currently PPI networks for individuals are not available, but gene expression data is becoming easier to obtain and allows us to represent individuals by a co-integrated gene expression/protein interaction network. Two major problems hamper the application of graph kernels - state-of-the-art methods for whole-graph comparison - to compare PPI networks. First, these methods do not scale to graphs of the size of a PPI network. Second, missing edges in these interaction networks are biologically relevant for detecting discrepancies, yet, these methods do not take this into account. In this article we present graph kernels for biological network comparison that are fast to compute and take into account missing interactions. We evaluate their practical performance on two datasets of co-integrated gene expression/PPI networks.


Asunto(s)
Mapeo de Interacción de Proteínas/estadística & datos numéricos , Biología Computacional , Bases de Datos Genéticas , Progresión de la Enfermedad , Perfilación de la Expresión Génica/estadística & datos numéricos , Humanos , Pronóstico , Análisis por Matrices de Proteínas/estadística & datos numéricos
6.
Pac Symp Biocomput ; : 547-58, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-17094268

RESUMEN

We present a kernel-based approach to the classification of time series of gene expression profiles. Our method takes into account the dynamic evolution over time as well as the temporal characteristics of the data. More specifically, we model the evolution of the gene expression profiles as a Linear Time Invariant (LTI) dynamical system and estimate its model parameters. A kernel on dynamical systems is then used to classify these time series. We successfully test our approach on a published dataset to predict response to drug therapy in Multiple Sclerosis patients. For pharmacogenomics, our method offers a huge potential for advanced computational tools in disease diagnosis, and disease and drug therapy outcome prognosis.


Asunto(s)
Perfilación de la Expresión Génica/estadística & datos numéricos , Inteligencia Artificial , Biología Computacional , Bases de Datos Genéticas , Humanos , Modelos Lineales , Modelos Genéticos , Esclerosis Múltiple/tratamiento farmacológico , Esclerosis Múltiple/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Farmacogenética/estadística & datos numéricos , Factores de Tiempo
7.
Bioinformatics ; 21 Suppl 1: i47-56, 2005 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-15961493

RESUMEN

MOTIVATION: Computational approaches to protein function prediction infer protein function by finding proteins with similar sequence, structure, surface clefts, chemical properties, amino acid motifs, interaction partners or phylogenetic profiles. We present a new approach that combines sequential, structural and chemical information into one graph model of proteins. We predict functional class membership of enzymes and non-enzymes using graph kernels and support vector machine classification on these protein graphs. RESULTS: Our graph model, derivable from protein sequence and structure only, is competitive with vector models that require additional protein information, such as the size of surface pockets. If we include this extra information into our graph model, our classifier yields significantly higher accuracy levels than the vector models. Hyperkernels allow us to select and to optimally combine the most relevant node attributes in our protein graphs. We have laid the foundation for a protein function prediction system that integrates protein information from various sources efficiently and effectively. AVAILABILITY: More information available via www.dbs.ifi.lmu.de/Mitarbeiter/borgwardt.html.


Asunto(s)
Biología Computacional/métodos , Enzimas/química , Algoritmos , Bases de Datos de Proteínas , Modelos Estadísticos , Conformación Proteica , Estructura Secundaria de Proteína , Análisis de Secuencia de Proteína/métodos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA