Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Neural Comput ; 27(10): 2039-96, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26313601

RESUMO

Efficient learning of a data analysis task strongly depends on the data representation. Most methods rely on (symmetric) similarity or dissimilarity representations by means of metric inner products or distances, providing easy access to powerful mathematical formalisms like kernel or branch-and-bound approaches. Similarities and dissimilarities are, however, often naturally obtained by nonmetric proximity measures that cannot easily be handled by classical learning algorithms. Major efforts have been undertaken to provide approaches that can either directly be used for such data or to make standard methods available for these types of data. We provide a comprehensive survey for the field of learning with nonmetric proximities. First, we introduce the formalism used in nonmetric spaces and motivate specific treatments for nonmetric proximity data. Second, we provide a systematization of the various approaches. For each category of approaches, we provide a comparative discussion of the individual algorithms and address complexity issues and generalization properties. In a summarizing section, we provide a larger experimental study for the majority of the algorithms on standard data sets. We also address the problem of large-scale proximity learning, which is often overlooked in this context and of major importance to make the method relevant in practice. The algorithms we discuss are in general applicable for proximity-based clustering, one-class classification, classification, regression, and embedding approaches. In the experimental part, we focus on classification tasks.

2.
Brief Bioinform ; 9(2): 129-43, 2008 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-18334515

RESUMO

In the present contribution we propose two recently developed classification algorithms for the analysis of mass-spectrometric data-the supervised neural gas and the fuzzy-labeled self-organizing map. The algorithms are inherently regularizing, which is recommended, for these spectral data because of its high dimensionality and the sparseness for specific problems. The algorithms are both prototype-based such that the principle of characteristic representants is realized. This leads to an easy interpretation of the generated classifcation model. Further, the fuzzy-labeled self-organizing map is able to process uncertainty in data, and classification results can be obtained as fuzzy decisions. Moreover, this fuzzy classification together with the property of topographic mapping offers the possibility of class similarity detection, which can be used for class visualization. We demonstrate the power of both methods for two exemplary examples: the classification of bacteria (listeria types) and neoplastic and non-neoplastic cell populations in breast cancer tissue sections.


Assuntos
Algoritmos , Interpretação Estatística de Dados , Espectrometria de Massas/métodos , Proteômica , Bactérias/classificação , Mama , Metodologias Computacionais , Feminino , Lógica Fuzzy , Humanos , Matemática , Modelos Estatísticos , Neoplasias/patologia , Proteômica/instrumentação , Proteômica/métodos
3.
Methods Mol Biol ; 1362: 185-95, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26519178

RESUMO

Sequence data are widely used to get a deeper insight into biological systems. From a data analysis perspective they are given as a set of sequences of symbols with varying length. In general they are compared using nonmetric score functions. In this form the data are nonstandard, because they do not provide an immediate metric vector space and their analysis using standard methods is complicated. In this chapter we provide various strategies for how to analyze these type of data in a mathematically accurate way instead of the often seen ad hoc solutions. Our approach is based on the scoring values from protein sequence data although be applicable in a broader sense. We discuss potential recoding concepts of the scores and discuss algorithms to solve clustering, classification and embedding tasks for score data for a protein sequence application.


Assuntos
Algoritmos , Biologia Computacional/métodos , Análise de Sequência de Proteína/métodos
4.
Int J Neural Syst ; 22(5): 1250021, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22931439

RESUMO

Prototype based learning offers an intuitive interface to inspect large quantities of electronic data in supervised or unsupervised settings. Recently, many techniques have been extended to data described by general dissimilarities rather than Euclidean vectors, so-called relational data settings. Unlike the Euclidean counterparts, the techniques have quadratic time complexity due to the underlying quadratic dissimilarity matrix. Thus, they are infeasible already for medium sized data sets. The contribution of this article is twofold: On the one hand we propose a novel supervised prototype based classification technique for dissimilarity data based on popular learning vector quantization (LVQ), on the other hand we transfer a linear time approximation technique, the Nyström approximation, to this algorithm and an unsupervised counterpart, the relational generative topographic mapping (GTM). This way, linear time and space methods result. We evaluate the techniques on three examples from the biomedical domain.


Assuntos
Inteligência Artificial , Algoritmos , Cromossomos/genética , Análise por Conglomerados , Biologia Computacional , Interpretação Estatística de Dados , Bases de Dados Genéticas , Humanos , Modelos Lineares , Modelos Estatísticos , Distribuição Normal , Software , Máquina de Vetores de Suporte , Fatores de Tempo , Vibrio/genética
5.
Neural Netw ; 26: 159-73, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22041220

RESUMO

We present an extension of the recently introduced Generalized Matrix Learning Vector Quantization algorithm. In the original scheme, adaptive square matrices of relevance factors parameterize a discriminative distance measure. We extend the scheme to matrices of limited rank corresponding to low-dimensional representations of the data. This allows to incorporate prior knowledge of the intrinsic dimension and to reduce the number of adaptive parameters efficiently. In particular, for very large dimensional data, the limitation of the rank can reduce computation time and memory requirements significantly. Furthermore, two- or three-dimensional representations constitute an efficient visualization method for labeled data sets. The identification of a suitable projection is not treated as a pre-processing step but as an integral part of the supervised training. Several real world data sets serve as an illustration and demonstrate the usefulness of the suggested method.


Assuntos
Inteligência Artificial , Aprendizagem , Algoritmos , Análise Discriminante , Humanos , Reconhecimento Automatizado de Padrão
6.
Artif Intell Med ; 45(2-3): 215-28, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-18778925

RESUMO

OBJECTIVE: Mass spectrometry has become a standard technique to analyze clinical samples in cancer research. The obtained spectrometric measurements reveal a lot of information of the clinical sample at the peptide and protein level. The spectra are high dimensional and, due to the small number of samples a sparse coverage of the population is very common. In clinical research the calculation and evaluation of classification models is important. For classical statistics this is achieved by hypothesis testing with respect to a chosen level of confidence. In clinical proteomics the application of statistical tests is limited due to the small number of samples and the high dimensionality of the data. Typically soft methods from the field of machine learning are used to generate such models. However for these methods no or only few additional information about the safety of the model decision is available. In this contribution the spectral data are processed as functional data and conformal classifier models are generated. The obtained models allow the detection of potential biomarker candidates and provide confidence measures for the classification decision. METHODS: First, wavelet-based techniques for the efficient processing and encoding of mass spectrometric measurements from clinical samples are presented. A prototype-based classifier is extended by a functional metric and combined with the concept of conformal prediction to classify the clinical proteomic spectra and to evaluate the results. RESULTS: Clinical proteomic data of a colorectal cancer and a lung cancer study are used to test the performance of the proposed algorithm. The prototype classifiers are evaluated with respect to prediction accuracy and the confidence of the classification decisions. The adapted metric parameters are analyzed and interpreted to find potential biomarker candidates. CONCLUSIONS: The proposed algorithm can be used to analyze functional data as obtained from clinical mass spectrometry, to find discriminating mass positions and to judge the confidence of the obtained classifications, providing robust and interpretable classification models.


Assuntos
Biologia Computacional , Espectrometria de Massas/métodos , Modelos Teóricos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa