RESUMO
We present a conceptual framework for the development of visual interactive techniques to formalize and externalize trust in machine learning (ML) workflows. Currently, trust in ML applications is an implicit process that takes place in the user's mind. As such, there is no method of feedback or communication of trust that can be acted upon. Our framework will be instrumental in developing interactive visualization approaches that will help users to efficiently and effectively build and communicate trust in ways that fit each of the ML process stages. We formulate several research questions and directions that include: 1) a typology/taxonomy of trust objects, trust issues, and possible reasons for (mis)trust; 2) formalisms to represent trust in machine-readable form; 3) means by which users can express their state of trust by interacting with a computer system (e.g., text, drawing, marking); 4) ways in which a system can facilitate users' expression and communication of the state of trust; and 5) creation of visual interactive techniques for representation and exploration of trust over all stages of an ML pipeline.
RESUMO
Dimension reduction (DR) computes faithful low-dimensional (LD) representations of high-dimensional (HD) data. Outstanding performances are achieved by recent neighbor embedding (NE) algorithms such as t -SNE, which mitigate the curse of dimensionality. The single-scale or multiscale nature of NE schemes drives the HD neighborhood preservation in the LD space (LDS). While single-scale methods focus on single-sized neighborhoods through the concept of perplexity, multiscale ones preserve neighborhoods in a broader range of sizes and account for the global HD organization to define the LDS. For both single-scale and multiscale methods, however, their time complexity in the number of samples is unaffordable for big data sets. Single-scale methods can be accelerated by relying on the inherent sparsity of the HD similarities they involve. On the other hand, the dense structure of the multiscale HD similarities prevents developing fast multiscale schemes in a similar way. This article addresses this difficulty by designing randomized accelerations of the multiscale methods. To account for all levels of interactions, the HD data are first subsampled at different scales, enabling to identify small and relevant neighbor sets for each data point thanks to vantage-point trees. Afterward, these sets are employed with a Barnes-Hut algorithm to cheaply evaluate the considered cost function and its gradient, enabling large-scale use of multiscale NE schemes. Extensive experiments demonstrate that the proposed accelerations are, statistically significantly, both faster than the original multiscale methods by orders of magnitude, and better preserving the HD neighborhoods than state-of-the-art single-scale schemes, leading to high-quality LD embeddings. Public codes are freely available at https://github.com/cdebodt.
RESUMO
Thermosensation is crucial for humans to probe the environment and detect threats arising from noxious heat or cold. Over the last years, EEG frequency-tagging using long-lasting periodic radiant heat stimulation has been proposed as a means to study the cortical processes underlying tonic heat perception. This approach is based on the notion that periodic modulation of a sustained stimulus can elicit synchronized periodic activity in the neuronal populations responding to the stimulus, known as a steady-state response (SSR). In this paper, we extend this approach using a contact thermode to generate both heat- and cold-evoked SSRs. Furthermore, we characterize the temporal dynamics of the elicited responses, relate these dynamics to perception, and assess the effects of displacing the stimulated skin surface to gain insight on the heat- and cold-sensitive afferents conveying these responses. Two experiments were conducted in healthy volunteers. In both experiments, noxious heat and innocuous cool stimuli were applied during 75 seconds to the forearm using a Peltier-based contact thermode, with intensities varying sinusoidally at 0.2 Hz. Displacement of the thermal stimulation on the skin surface was achieved by independently controlling the Peltier elements of the thermal probe. Continuous intensity ratings to sustained heat and cold stimulation were obtained in the first experiment with 14 subjects, and the EEG was recorded in the second experiment on 15 subjects. Both contact heat and cool stimulation elicited periodic EEG responses and percepts. Compared to heat stimulation, the responses to cool stimulation had a lower magnitude and shorter latency. All responses tended to habituate along time, and this response attenuation was most pronounced for cool compared to warm stimulation, and for stimulation delivered using a fixed surface compared to a variable surface.
Assuntos
Temperatura Baixa , Eletroencefalografia , Temperatura Alta , Percepção , Processamento de Sinais Assistido por Computador , Sensação Térmica/fisiologia , Adulto , Ritmo alfa/fisiologia , Análise de Variância , Feminino , Habituação Psicofisiológica , Humanos , Masculino , Fatores de Tempo , Adulto JovemRESUMO
Dimensionality reduction (DR) aims at faithfully and meaningfully representing high-dimensional (HD) data into a low-dimensional (LD) space. Recently developed neighbor embedding DR methods lead to outstanding performances, thanks to their ability to foil the curse of dimensionality. Unfortunately, they cannot be directly employed on incomplete data sets, which become ubiquitous in machine learning. Discarding samples with missing features prevents their LD coordinates computation and deteriorates the complete samples treatment. Common missing data imputation schemes are not appropriate in the nonlinear DR context either. Indeed, even if they model the data distribution in the feature space, they can, at best, enable the application of a DR scheme on the expected data set. In practice, one would, instead, like to obtain the LD embedding with the closest cost function value on average with respect to the complete data case. As the state-of-the-art DR techniques are nonlinear, the latter embedding results from minimizing the expected cost function on the incomplete database, not from considering the expected data set. This paper addresses these limitations by developing a general methodology for nonlinear DR with missing data, being directly applicable with any DR scheme optimizing some criterion. In order to model the feature dependences, an HD extension of Gaussian mixture models is first fitted on the incomplete data set. It is afterward employed under the multiple imputation paradigms to obtain a single relevant LD embedding, thus minimizing the cost function expectation. Extensive experiments demonstrate the superiority of the suggested framework over alternative approaches.
RESUMO
A new variational Bayesian learning algorithm for Student-t mixture models is introduced. This algorithm leads to (i) robust density estimation, (ii) robust clustering and (iii) robust automatic model selection. Gaussian mixture models are learning machines which are based on a divide-and-conquer approach. They are commonly used for density estimation and clustering tasks, but are sensitive to outliers. The Student-t distribution has heavier tails than the Gaussian distribution and is therefore less sensitive to any departure of the empirical distribution from Gaussianity. As a consequence, the Student-t distribution is suitable for constructing robust mixture models. In this work, we formalize the Bayesian Student-t mixture model as a latent variable model in a different way from Svensén and Bishop [Svensén, M., & Bishop, C. M. (2005). Robust Bayesian mixture modelling. Neurocomputing, 64, 235-252]. The main difference resides in the fact that it is not necessary to assume a factorized approximation of the posterior distribution on the latent indicator variables and the latent scale variables in order to obtain a tractable solution. Not neglecting the correlations between these unobserved random variables leads to a Bayesian model having an increased robustness. Furthermore, it is expected that the lower bound on the log-evidence is tighter. Based on this bound, the model complexity, i.e. the number of components in the mixture, can be inferred with a higher confidence.
Assuntos
Teorema de Bayes , Análise por Conglomerados , Robótica , Algoritmos , Humanos , Aprendizagem , Distribuição NormalRESUMO
In spite of the numerous approaches that have been derived for solving the independent component analysis (ICA) problem, it is still interesting to develop new methods when, among other reasons, specific a priori knowledge may help to further improve the separation performances. In this paper, the minimum-range approach to blind extraction of bounded source is investigated. The relationship with other existing well-known criteria is established. It is proved that the minimum-range approach is a contrast, and that the criterion is discriminant in the sense that it is free of spurious maxima. The practical issues are also discussed, and a range measure estimation is proposed based on the order statistics. An algorithm for contrast maximization over the group of special orthogonal matrices is proposed. Simulation results illustrate the performances of the algorithm when using the proposed range estimation criterion.
Assuntos
Algoritmos , Inteligência Artificial , Técnicas de Apoio para a Decisão , Armazenamento e Recuperação da Informação/métodos , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador , Redes Neurais de Computação , Análise de Componente PrincipalRESUMO
Mapping high-dimensional data in a low-dimensional space, for example, for visualization, is a problem of increasingly major concern in data analysis. This paper presents data-driven high-dimensional scaling (DD-HDS), a nonlinear mapping method that follows the line of multidimensional scaling (MDS) approach, based on the preservation of distances between pairs of data. It improves the performance of existing competitors with respect to the representation of high-dimensional data, in two ways. It introduces (1) a specific weighting of distances between data taking into account the concentration of measure phenomenon and (2) a symmetric handling of short distances in the original and output spaces, avoiding false neighbor representations while still allowing some necessary tears in the original distribution. More precisely, the weighting is set according to the effective distribution of distances in the data set, with the exception of a single user-defined parameter setting the tradeoff between local neighborhood preservation and global mapping. The optimization of the stress criterion designed for the mapping is realized by "force-directed placement" (FDP). The mappings of low- and high-dimensional data sets are presented as illustrations of the features and advantages of the proposed algorithm. The weighting function specific to high-dimensional data and the symmetric handling of short distances can be easily incorporated in most distance preservation-based nonlinear dimensionality reduction methods.
Assuntos
Algoritmos , Inteligência Artificial , Gráficos por Computador , Apresentação de Dados , Imageamento Tridimensional/métodos , Armazenamento e Recuperação da Informação/métodos , Modelos Teóricos , Simulação por ComputadorRESUMO
Clustering methods are commonly applied to time series, either as a preprocessing stage for other methods or in their own right. In this paper it is explained why time series clustering may sometimes be considered as meaningless. This problematic situation is illustrated for various raw time series. The unfolding preprocessing methodology is then introduced. The usefulness of unfolding preprocessing is illustrated for various time series. The experimental results show the meaningfulness of the clustering when applied on adequately unfolded time series.
Assuntos
Análise por Conglomerados , Interpretação Estatística de Dados , Processamento de Sinais Assistido por Computador , Humanos , Dinâmica não Linear , Fatores de TempoRESUMO
Extreme learning machines (ELMs) are fast methods that obtain state-of-the-art results in regression. However, they are not robust to outliers and their meta-parameter (i.e., the number of neurons for standard ELMs and the regularization constant of output weights for L2 -regularized ELMs) selection is biased by such instances. This paper proposes a new robust inference algorithm for ELMs which is based on the pointwise probability reinforcement methodology. Experiments show that the proposed approach produces results which are comparable to the state of the art, while being often faster.
RESUMO
Self-organizing maps (SOMs) are widely used in several fields of application, from neurobiology to multivariate data analysis. In that context, this paper presents variants of the classic SOM algorithm. With respect to the traditional SOM, the modifications regard the core of the algorithm, (the learning rule), but do not alter the two main tasks it performs, i.e. vector quantization combined with topology preservation. After an intuitive justification based on geometrical considerations, three new rules are defined in addition to the original one. They develop interesting properties such as recursive neighborhood adaptation and non-radial neighborhood adaptation. In order to assess the relative performances and speeds of convergence, the four rules are used to train several maps and the results are compared according to several error measures (quantization error and topology preservation criterions).
Assuntos
Redes Neurais de Computação , Aprendizagem por Probabilidade , AlgoritmosRESUMO
Results of neural network learning are always subject to some variability, due to the sensitivity to initial conditions, to convergence to local minima, and, sometimes more dramatically, to sampling variability. This paper presents a set of tools designed to assess the reliability of the results of self-organizing maps (SOM), i.e. to test on a statistical basis the confidence we can have on the result of a specific SOM. The tools concern the quantization error in a SOM, and the neighborhood relations (both at the level of a specific pair of observations and globally on the map). As a by-product, these measures also allow to assess the adequacy of the number of units chosen in a map. The tools may also be used to measure objectively how the SOM are less sensitive to non-linear optimization problems (local minima, convergence, etc.) than other neural network models.
Assuntos
Redes Neurais de Computação , Bases de Dados como Assunto/estatística & dados numéricos , Reprodutibilidade dos TestesRESUMO
The Kohonen self-organization map is usually considered as a classification or clustering tool, with only a few applications in time series prediction. In this paper, a particular time series forecasting method based on Kohonen maps is described. This method has been specifically designed for the prediction of long-term trends. The proof of the stability of the method for long-term forecasting is given, as well as illustrations of the utilization of the method both in the scalar and vectorial cases.
Assuntos
Previsões , Redes Neurais de Computação , Método de Monte Carlo , Polônia , Centrais ElétricasRESUMO
Within the framework of the OPTIVIP project, an optic nerve based visual prosthesis is developed in order to restore partial vision to the blind. One of the main challenges is to understand, decode and model the physiological process linking the stimulating parameters to the visual sensations produced in the visual field of a blind volunteer. We propose to use adaptive neural techniques. Two prediction models are investigated. The first one is a grey-box model exploiting the neurophysiological knowledge available up to now. It combines a neurophysiological model with artificial neural networks, such as multi-layer perceptrons and radial basis function networks, in order to predict the features of the visual perceptions. The second model is entirely of the black-box type. We show that both models provide satisfactory prediction tools and achieve similar prediction accuracies. Moreover, we demonstrate that significant improvement (25%) was gained with respect to linear statistical methods, suggesting that the biological process is strongly non-linear.
Assuntos
Cegueira/reabilitação , Modelos Psicológicos , Redes Neurais de Computação , Próteses e Implantes , Percepção Visual , Previsões , Humanos , Nervo Óptico/patologia , Nervo Óptico/fisiologiaRESUMO
Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation.
Assuntos
Interpretação Estatística de Dados , Probabilidade , Reforço Psicológico , Bases de Dados Factuais , Humanos , Análise de Regressão , Estatísticas não ParamétricasRESUMO
Label noise is an important issue in classification, with many potential negative consequences. For example, the accuracy of predictions may decrease, whereas the complexity of inferred models and the number of necessary training samples may increase. Many works in the literature have been devoted to the study of label noise and the development of techniques to deal with label noise. However, the field lacks a comprehensive survey on the different types of label noise, their consequences and the algorithms that consider label noise. This paper proposes to fill this gap. First, the definitions and sources of label noise are considered and a taxonomy of the types of label noise is proposed. Second, the potential consequences of label noise are discussed. Third, label noise-robust, label noise cleansing, and label noise-tolerant algorithms are reviewed. For each category of approaches, a short discussion is proposed to help the practitioner to choose the most suitable technique in its own particular field of application. Eventually, the design of experiments is also discussed, what may interest the researchers who would like to test their own algorithms. In this paper, label noise consists of mislabeled instances: no additional information is assumed to be available like e.g., confidences on labels.
RESUMO
Feature selection is an important preprocessing step for many high-dimensional regression problems. One of the most common strategies is to select a relevant feature subset based on the mutual information criterion. However, no connection has been established yet between the use of mutual information and a regression error criterion in the machine learning literature. This is obviously an important lack, since minimising such a criterion is eventually the objective one is interested in. This paper demonstrates that under some reasonable assumptions, features selected with the mutual information criterion are the ones minimising the mean squared error and the mean absolute error. On the contrary, it is also shown that the mutual information criterion can fail in selecting optimal features in some situations that we characterise. The theoretical developments presented in this work are expected to lead in practice to a critical and efficient use of the mutual information for feature selection.
Assuntos
Análise de Regressão , Algoritmos , Inteligência Artificial , Entropia , Processamento de Imagem Assistida por Computador , Informática , Armazenamento e Recuperação da Informação , Redes Neurais de Computação , Distribuição Normal , Razão Sinal-RuídoRESUMO
We study fifteen months of human mobility data for one and a half million individuals and find that human mobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier's antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals. We coarsen the data spatially and temporally to find a formula for the uniqueness of human mobility traces given their resolution and the available outside information. This formula shows that the uniqueness of mobility traces decays approximately as the 1/10 power of their resolution. Hence, even coarse datasets provide little anonymity. These findings represent fundamental constraints to an individual's privacy and have important implications for the design of frameworks and institutions dedicated to protect the privacy of individuals.
Assuntos
Bases de Dados como Assunto , Privacidade , Telefone Celular , Geografia , HumanosRESUMO
This paper proposes a method for the automatic classification of heartbeats in an ECG signal. Since this task has specific characteristics such as time dependences between observations and a strong class unbalance, a specific classifier is proposed and evaluated on real ECG signals from the MIT arrhythmia database. This classifier is a weighted variant of the conditional random fields classifier. Experiments show that the proposed method outperforms previously reported heartbeat classification methods, especially for the pathological heartbeats.
Assuntos
Algoritmos , Arritmias Cardíacas/diagnóstico , Inteligência Artificial , Diagnóstico por Computador/métodos , Eletrocardiografia/métodos , Frequência Cardíaca , Reconhecimento Automatizado de Padrão/métodos , Arritmias Cardíacas/fisiopatologia , Humanos , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
The large number of methods for EEG feature extraction demands a good choice for EEG features for every task. This paper compares three subsets of features obtained by tracks extraction method, wavelet transform and fractional Fourier transform. Particularly, we compare the performance of each subset in classification tasks using support vector machines and then we select possible combination of features by feature selection methods based on forward-backward procedure and mutual information as relevance criteria. Results confirm that fractional Fourier transform coefficients present very good performance and also the possibility of using some combination of this features to improve the performance of the classifier. To reinforce the relevance of the study, we carry out 1000 independent runs using a bootstrap approach, and evaluate the statistical significance of the F(score) results using the Kruskal-Wallis test.
Assuntos
Algoritmos , Inteligência Artificial , Diagnóstico por Computador/métodos , Eletroencefalografia/métodos , Reconhecimento Automatizado de Padrão/métodos , Convulsões/diagnóstico , Humanos , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
This paper uses Mutual Information as an alternative variable selection method for quantitative structure-property relationships data. To evaluate the performance of this criterion, the enantioselectivity of 67 molecules, in three different chiral stationary phases, is modelled. Partial Least Squares together with three commonly used variable selection techniques was evaluated and then compared with the results obtained when using Mutual Information together with Support Vector Machines. The results show not only that variable selection is a necessary step in quantitative structure-property relationship modelling, but also that Mutual Information associated with Support Vector Machines is a valuable alternative to Partial Least Squares together with correlation between the explanatory and the response variables or Genetic Algorithms. This study also demonstrates that by producing models that use a rather small set of variables the interpretation can be also be improved.