Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
IEEE Trans Pattern Anal Mach Intell ; 31(6): 1048-58, 2009 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19372609

RESUMO

As a fundamental problem in pattern recognition, graph matching has applications in a variety of fields, from computer vision to computational biology. In graph matching, patterns are modeled as graphs and pattern recognition amounts to finding a correspondence between the nodes of different graphs. Many formulations of this problem can be cast in general as a quadratic assignment problem, where a linear term in the objective function encodes node compatibility and a quadratic term encodes edge compatibility. The main research focus in this theme is about designing efficient algorithms for approximately solving the quadratic assignment problem, since it is NP-hard. In this paper we turn our attention to a different question: how to estimate compatibility functions such that the solution of the resulting graph matching problem best matches the expected solution that a human would manually provide. We present a method for learning graph matching: the training examples are pairs of graphs and the 'labels' are matches between them. Our experimental results reveal that learning can substantially improve the performance of standard graph matching algorithms. In particular, we find that simple linear assignment with such a learning scheme outperforms Graduated Assignment with bistochastic normalisation, a state-of-the-art quadratic assignment relaxation algorithm.


Assuntos
Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
2.
Bioinformatics ; 23(13): i490-8, 2007 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-17646335

RESUMO

MOTIVATION: Identifying significant genes among thousands of sequences on a microarray is a central challenge for cancer research in bioinformatics. The ultimate goal is to detect the genes that are involved in disease outbreak and progression. A multitude of methods have been proposed for this task of feature selection, yet the selected gene lists differ greatly between different methods. To accomplish biologically meaningful gene selection from microarray data, we have to understand the theoretical connections and the differences between these methods. In this article, we define a kernel-based framework for feature selection based on the Hilbert-Schmidt independence criterion and backward elimination, called BAHSIC. We show that several well-known feature selectors are instances of BAHSIC, thereby clarifying their relationship. Furthermore, by choosing a different kernel, BAHSIC allows us to easily define novel feature selection algorithms. As a further advantage, feature selection via BAHSIC works directly on multiclass problems. RESULTS: In a broad experimental evaluation, the members of the BAHSIC family reach high levels of accuracy and robustness when compared to other feature selection techniques. Experiments show that features selected with a linear kernel provide the best classification performance in general, but if strong non-linearities are present in the data then non-linear kernels can be more suitable. AVAILABILITY: Accompanying homepage is http://www.dbs.ifi.lmu.de/~borgward/BAHSIC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Biomarcadores Tumorais/análise , Diagnóstico por Computador/métodos , Perfilação da Expressão Gênica/métodos , Proteínas de Neoplasias/análise , Neoplasias/diagnóstico , Neoplasias/metabolismo , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Sensibilidade e Especificidade
3.
Bioinformatics ; 22(14): e49-57, 2006 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-16873512

RESUMO

MOTIVATION: Many problems in data integration in bioinformatics can be posed as one common question: Are two sets of observations generated by the same distribution? We propose a kernel-based statistical test for this problem, based on the fact that two distributions are different if and only if there exists at least one function having different expectation on the two distributions. Consequently we use the maximum discrepancy between function means as the basis of a test statistic. The Maximum Mean Discrepancy (MMD) can take advantage of the kernel trick, which allows us to apply it not only to vectors, but strings, sequences, graphs, and other common structured data types arising in molecular biology. RESULTS: We study the practical feasibility of an MMD-based test on three central data integration tasks: Testing cross-platform comparability of microarray data, cancer diagnosis, and data-content based schema matching for two different protein function classification schemas. In all of these experiments, including high-dimensional ones, MMD is very accurate in finding samples that were generated from the same distribution, and outperforms its best competitors. CONCLUSIONS: We have defined a novel statistical test of whether two samples are from the same distribution, compatible with both multivariate and structured data, that is fast, easy to implement, and works well, as confirmed by our experiments. AVAILABILITY: http://www.dbs.ifi.lmu.de/~borgward/MMD.


Assuntos
Algoritmos , Biologia Computacional/métodos , Interpretação Estatística de Dados , Bases de Dados Factuais , Armazenamento e Recuperação da Informação/métodos , Modelos Biológicos , Modelos Estatísticos , Simulação por Computador , Tamanho da Amostra , Distribuições Estatísticas , Integração de Sistemas
4.
Neural Netw ; 17(1): 127-41, 2004 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-14690713

RESUMO

In Support Vector (SV) regression, a parameter nu controls the number of Support Vectors and the number of points that come to lie outside of the so-called epsilon-insensitive tube. For various noise models and SV parameter settings, we experimentally determine the values of nu that lead to the lowest generalization error. We find good agreement with the values that had previously been predicted by a theoretical argument based on the asymptotic efficiency of a simplified model of SV regression. As a side effect of the experiments, valuable information about the generalization behavior of the remaining SVM parameters and their dependencies is gained. The experimental findings are valid even for complex 'real-world' data sets. Based on our results on the role of the nu-SVM parameters, we discuss various model selection methods.


Assuntos
Modelos Teóricos , Redes Neurais de Computação , Dinâmica não Linear , Análise de Regressão , Generalização Psicológica , Distribuição Normal
6.
IEEE Trans Pattern Anal Mach Intell ; 32(10): 1809-21, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20724758

RESUMO

Object matching is a fundamental operation in data analysis. It typically requires the definition of a similarity measure between the classes of objects to be matched. Instead, we develop an approach which is able to perform matching by requiring a similarity measure only within each of the classes. This is achieved by maximizing the dependency between matched pairs of observations by means of the Hilbert-Schmidt Independence Criterion. This problem can be cast as one of maximizing a quadratic assignment problem with special structure and we present a simple algorithm for finding a locally optimal solution.

7.
Bioinformatics ; 21 Suppl 1: i47-56, 2005 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-15961493

RESUMO

MOTIVATION: Computational approaches to protein function prediction infer protein function by finding proteins with similar sequence, structure, surface clefts, chemical properties, amino acid motifs, interaction partners or phylogenetic profiles. We present a new approach that combines sequential, structural and chemical information into one graph model of proteins. We predict functional class membership of enzymes and non-enzymes using graph kernels and support vector machine classification on these protein graphs. RESULTS: Our graph model, derivable from protein sequence and structure only, is competitive with vector models that require additional protein information, such as the size of surface pockets. If we include this extra information into our graph model, our classifier yields significantly higher accuracy levels than the vector models. Hyperkernels allow us to select and to optimally combine the most relevant node attributes in our protein graphs. We have laid the foundation for a protein function prediction system that integrates protein information from various sources efficiently and effectively. AVAILABILITY: More information available via www.dbs.ifi.lmu.de/Mitarbeiter/borgwardt.html.


Assuntos
Biologia Computacional/métodos , Enzimas/química , Algoritmos , Bases de Dados de Proteínas , Modelos Estatísticos , Conformação Proteica , Estrutura Secundária de Proteína , Análise de Sequência de Proteína/métodos , Software
8.
Neural Netw ; 11(4): 637-649, 1998 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-12662802

RESUMO

In this paper a correspondence is derived between regularization operators used in regularization networks and support vector kernels. We prove that the Green's Functions associated with regularization operators are suitable support vector kernels with equivalent regularization properties. Moreover, the paper provides an analysis of currently used support vector kernels in the view of regularization theory and corresponding operators associated with the classes of both polynomial kernels and translation invariant kernels. The latter are also analyzed on periodical domains. As a by-product we show that a large number of radial basis functions, namely conditionally positive definite functions, may be used as support vector kernels.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA