Búsqueda | BVS CLAP/SMR-OPS/OMS

1.

The Case for Shifting the Rényi Entropy.

Valverde-Albacete, Francisco J; Peláez-Moreno, Carmen.

Entropy (Basel) ; 21(1)2019 Jan 09.

Artículo en Inglés | MEDLINE | ID: mdl-33266762

RESUMEN

We introduce a variant of the Rényi entropy definition that aligns it with the well-known Hölder mean: in the new formulation, the r-th order Rényi Entropy is the logarithm of the inverse of the r-th order Hölder mean. This brings about new insights into the relationship of the Rényi entropy to quantities close to it, like the information potential and the partition function of statistical mechanics. We also provide expressions that allow us to calculate the Rényi entropies from the Shannon cross-entropy and the escort probabilities. Finally, we discuss why shifting the Rényi entropy is fruitful in some applications.

2.

The Rényi Entropies Operate in Positive Semifields.

Valverde-Albacete, Francisco J; Peláez-Moreno, Carmen.

Entropy (Basel) ; 21(8)2019 Aug 08.

Artículo en Inglés | MEDLINE | ID: mdl-33267493

RESUMEN

We set out to demonstrate that the Rényi entropies are better thought of as operating in a type of non-linear semiring called a positive semifield. We show how the Rényi's postulates lead to Pap's g-calculus where the functions carrying out the domain transformation are Rényi's information function and its inverse. In its turn, Pap's g-calculus under Rényi's information function transforms the set of positive reals into a family of semirings where "standard" product has been transformed into sum and "standard" sum into a power-emphasized sum. Consequently, the transformed product has an inverse whence the structure is actually that of a positive semifield. Instances of this construction lead to idempotent analysis and tropical algebra as well as to less exotic structures. We conjecture that this is one of the reasons why tropical algebra procedures, like the Viterbi algorithm of dynamic programming, morphological processing, or neural networks are so successful in computational intelligence applications. But also, why there seem to exist so many computational intelligence procedures to deal with "information" at large.

3.

Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle.

Valverde-Albacete, Francisco J; Peláez-Moreno, Carmen.

Entropy (Basel) ; 20(7)2018 Jun 27.

Artículo en Inglés | MEDLINE | ID: mdl-33265588

RESUMEN

Data transformation, e.g., feature transformation and selection, is an integral part of any machine learning procedure. In this paper, we introduce an information-theoretic model and tools to assess the quality of data transformations in machine learning tasks. In an unsupervised fashion, we analyze the transformation of a discrete, multivariate source of information X¯ into a discrete, multivariate sink of information Y¯ related by a distribution PX¯Y¯. The first contribution is a decomposition of the maximal potential entropy of (X¯,Y¯), which we call a balance equation, into its (a) non-transferable, (b) transferable, but not transferred, and (c) transferred parts. Such balance equations can be represented in (de Finetti) entropy diagrams, our second set of contributions. The most important of these, the aggregate channel multivariate entropy triangle, is a visual exploratory tool to assess the effectiveness of multivariate data transformations in transferring information from input to output variables. We also show how these decomposition and balance equations also apply to the entropies of X¯ and Y¯, respectively, and generate entropy triangles for them. As an example, we present the application of these tools to the assessment of information transfer efficiency for Principal Component Analysis and Independent Component Analysis as unsupervised feature transformation and selection procedures in supervised classification tasks.

4.

Interactive knowledge discovery and data mining on genomic expression data with numeric formal concept analysis.

González-Calabozo, Jose M; Valverde-Albacete, Francisco J; Peláez-Moreno, Carmen.

BMC Bioinformatics ; 17(1): 374, 2016 Sep 15.

Artículo en Inglés | MEDLINE | ID: mdl-27628041

RESUMEN

BACKGROUND: Gene Expression Data (GED) analysis poses a great challenge to the scientific community that can be framed into the Knowledge Discovery in Databases (KDD) and Data Mining (DM) paradigm. Biclustering has emerged as the machine learning method of choice to solve this task, but its unsupervised nature makes result assessment problematic. This is often addressed by means of Gene Set Enrichment Analysis (GSEA). RESULTS: We put forward a framework in which GED analysis is understood as an Exploratory Data Analysis (EDA) process where we provide support for continuous human interaction with data aiming at improving the step of hypothesis abduction and assessment. We focus on the adaptation to human cognition of data interpretation and visualization of the output of EDA. First, we give a proper theoretical background to bi-clustering using Lattice Theory and provide a set of analysis tools revolving around [Formula: see text]-Formal Concept Analysis ([Formula: see text]-FCA), a lattice-theoretic unsupervised learning technique for real-valued matrices. By using different kinds of cost structures to quantify expression we obtain different sequences of hierarchical bi-clusterings for gene under- and over-expression using thresholds. Consequently, we provide a method with interleaved analysis steps and visualization devices so that the sequences of lattices for a particular experiment summarize the researcher's vision of the data. This also allows us to define measures of persistence and robustness of biclusters to assess them. Second, the resulting biclusters are used to index external omics databases-for instance, Gene Ontology (GO)-thus offering a new way of accessing publicly available resources. This provides different flavors of gene set enrichment against which to assess the biclusters, by obtaining their p-values according to the terminology of those resources. We illustrate the exploration procedure on a real data example confirming results previously published. CONCLUSIONS: The GED analysis problem gets transformed into the exploration of a sequence of lattices enabling the visualization of the hierarchical structure of the biclusters with a certain degree of granularity. The ability of FCA-based bi-clustering methods to index external databases such as GO allows us to obtain a quality measure of the biclusters, to observe the evolution of a gene throughout the different biclusters it appears in, to look for relevant biclusters-by observing their genes and what their persistence is-to infer, for instance, hypotheses on their function.

Asunto(s)

Perfilación de la Expresión Génica/métodos , Algoritmos , Análisis por Conglomerados , Minería de Datos , Ontología de Genes , Genómica , Análisis de Secuencia por Matrices de Oligonucleótidos

5.

100% classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox.

Valverde-Albacete, Francisco J; Peláez-Moreno, Carmen.

PLoS One ; 9(1): e84217, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-24427282

RESUMEN

The most widely spread measure of performance, accuracy, suffers from a paradox: predictive models with a given level of accuracy may have greater predictive power than models with higher accuracy. Despite optimizing classification error rate, high accuracy models may fail to capture crucial information transfer in the classification task. We present evidence of this behavior by means of a combinatorial analysis where every possible contingency matrix of 2, 3 and 4 classes classifiers are depicted on the entropy triangle, a more reliable information-theoretic tool for classification assessment. Motivated by this, we develop from first principles a measure of classification performance that takes into consideration the information learned by classifiers. We are then able to obtain the entropy-modulated accuracy (EMA), a pessimistic estimate of the expected accuracy with the influence of the input distribution factored out, and the normalized information transfer factor (NIT), a measure of how efficient is the transmission of information from the input to the output set of classes. The EMA is a more natural measure of classification performance than accuracy when the heuristic to maximize is the transfer of information through the classifier instead of classification error count. The NIT factor measures the effectiveness of the learning process in classifiers and also makes it harder for them to "cheat" using techniques like specialization, while also promoting the interpretability of results. Their use is demonstrated in a mind reading task competition that aims at decoding the identity of a video stimulus based on magnetoencephalography recordings. We show how the EMA and the NIT factor reject rankings based in accuracy, choosing more meaningful and interpretable classifiers.

Asunto(s)

Modelos Teóricos , Algoritmos , Humanos

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA