RESUMEN
Analyzing deep neural networks (DNNs) via information plane (IP) theory has gained tremendous attention recently to gain insight into, among others, DNNs' generalization ability. However, it is by no means obvious how to estimate the mutual information (MI) between each hidden layer and the input/desired output to construct the IP. For instance, hidden layers with many neurons require MI estimators with robustness toward the high dimensionality associated with such layers. MI estimators should also be able to handle convolutional layers while at the same time being computationally tractable to scale to large networks. Existing IP methods have not been able to study truly deep convolutional neural networks (CNNs). We propose an IP analysis using the new matrix-based Rényi's entropy coupled with tensor kernels, leveraging the power of kernel methods to represent properties of the probability distribution independently of the dimensionality of the data. Our results shed new light on previous studies concerning small-scale DNNs using a completely new approach. We provide a comprehensive IP analysis of large-scale CNNs, investigating the different training phases and providing new insights into the training dynamics of large-scale neural networks.
RESUMEN
This letter introduces a new framework for quantifying predictive uncertainty for both data and models that relies on projecting the data into a gaussian reproducing kernel Hilbert space (RKHS) and transforming the data probability density function (PDF) in a way that quantifies the flow of its gradient as a topological potential field (quantified at all points in the sample space). This enables the decomposition of the PDF gradient flow by formulating it as a moment decomposition problem using operators from quantum physics, specifically Schrödinger's formulation. We experimentally show that the higher-order moments systematically cluster the different tail regions of the PDF, thereby providing unprecedented discriminative resolution of data regions having high epistemic uncertainty. In essence, this approach decomposes local realizations of the data PDF in terms of uncertainty moments. We apply this framework as a surrogate tool for predictive uncertainty quantification of point-prediction neural network models, overcoming various limitations of conventional Bayesian-based uncertainty quantification methods. Experimental comparisons with some established methods illustrate performance advantages that our framework exhibits.
RESUMEN
BACKGROUND: Evidence suggests that increased early postoperative pain (POP) intensities are associated with increased pain in the weeks following surgery. However, it remains unclear which temporal aspects of this early POP relate to later pain experience. In this prospective cohort study, we used wavelet analysis of clinically captured POP intensity data on postoperative days 1 and 2 to characterize slow/fast dynamics of POP intensities and predict pain outcomes on postoperative day 30. METHODS: The study used clinical POP time series from the first 48 hours following surgery from 218 patients to predict their mean POP on postoperative day 30. We first used wavelet analysis to approximate the POP series and to represent the series at different time scales to characterize the early temporal profile of acute POP in the first 2 postoperative days. We then used the wavelet coefficients alongside demographic parameters as inputs to a neural network to predict the risk of severe pain 30 days after surgery. RESULTS: Slow dynamic approximation components, but not fast dynamic detailed components, were linked to pain intensity on postoperative day 30. Despite imbalanced outcome rates, using wavelet decomposition along with a neural network for classification, the model achieved an F score of 0.79 and area under the receiver operating characteristic curve of 0.74 on test-set data for classifying pain intensities on postoperative day 30. The wavelet-based approach outperformed logistic regression (F score of 0.31) and neural network (F score of 0.22) classifiers that were restricted to sociodemographic variables and linear trajectories of pain intensities. CONCLUSIONS: These findings identify latent mechanistic information within the temporal domain of clinically documented acute POP intensity ratings, which are accessible via wavelet analysis, and demonstrate that such temporal patterns inform pain outcomes at postoperative day 30.
Asunto(s)
Dimensión del Dolor , Percepción del Dolor , Umbral del Dolor , Dolor Postoperatorio/diagnóstico , Análisis de Ondículas , Anciano , Femenino , Humanos , Masculino , Persona de Mediana Edad , Redes Neurales de la Computación , Dolor Postoperatorio/etiología , Dolor Postoperatorio/fisiopatología , Dolor Postoperatorio/psicología , Valor Predictivo de las Pruebas , Estudios Prospectivos , Recuperación de la Función , Índice de Severidad de la Enfermedad , Factores de TiempoRESUMEN
We propose a novel family of connectionist models based on kernel machines and consider the problem of learning layer by layer a compositional hypothesis class (i.e., a feedforward, multilayer architecture) in a supervised setting. In terms of the models, we present a principled method to "kernelize" (partly or completely) any neural network (NN). With this method, we obtain a counterpart of any given NN that is powered by kernel machines instead of neurons. In terms of learning, when learning a feedforward deep architecture in a supervised setting, one needs to train all the components simultaneously using backpropagation (BP) since there are no explicit targets for the hidden layers (Rumelhart, Hinton, & Williams, 1986). We consider without loss of generality the two-layer case and present a general framework that explicitly characterizes a target for the hidden layer that is optimal for minimizing the objective function of the network. This characterization then makes possible a purely greedy training scheme that learns one layer at a time, starting from the input layer. We provide instantiations of the abstract framework under certain architectures and objective functions. Based on these instantiations, we present a layer-wise training algorithm for an l-layer feedforward network for classification, where l≥2 can be arbitrary. This algorithm can be given an intuitive geometric interpretation that makes the learning dynamics transparent. Empirical results are provided to complement our theory. We show that the kernelized networks, trained layer-wise, compare favorably with classical kernel machines as well as other connectionist models trained by BP. We also visualize the inner workings of the greedy kernelized models to validate our claim on the transparency of the layer-wise algorithm.
RESUMEN
Feature selection aims to select the smallest feature subset that yields the minimum generalization error. In the rich literature in feature selection, information theory-based approaches seek a subset of features such that the mutual information between the selected features and the class labels is maximized. Despite the simplicity of this objective, there still remain several open problems in optimization. These include, for example, the automatic determination of the optimal subset size (i.e., the number of features) or a stopping criterion if the greedy searching strategy is adopted. In this paper, we suggest two stopping criteria by just monitoring the conditional mutual information (CMI) among groups of variables. Using the recently developed multivariate matrix-based Rényi's α-entropy functional, which can be directly estimated from data samples, we showed that the CMI among groups of variables can be easily computed without any decomposition or approximation, hence making our criteria easy to implement and seamlessly integrated into any existing information theoretic feature selection methods with a greedy search strategy.
RESUMEN
In this paper, we propose an approach to obtain reduced-order models of Markov chains. Our approach is composed of two information-theoretic processes. The first is a means of comparing pairs of stationary chains on different state spaces, which is done via the negative, modified Kullback-Leibler divergence defined on a model joint space. Model reduction is achieved by solving a value-of-information criterion with respect to this divergence. Optimizing the criterion leads to a probabilistic partitioning of the states in the high-order Markov chain. A single free parameter that emerges through the optimization process dictates both the partition uncertainty and the number of state groups. We provide a data-driven means of choosing the 'optimal' value of this free parameter, which sidesteps needing to a priori know the number of state groups in an arbitrary chain.
RESUMEN
In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. Information, in this criterion, is quantified by a parameter that can be varied during search. We demonstrate that a simulated-annealing-like update of this parameter, with a sufficiently fast cooling schedule, leads to a regret that is logarithmic with respect to the number of arm pulls.
RESUMEN
Accurately encoding time is one of the fundamental challenges faced by the nervous system in mediating behavior. We recently reported that some animals have a specialized population of rhythmically active neurons in their olfactory organs with the potential to peripherally encode temporal information about odor encounters. If these neurons do indeed encode the timing of odor arrivals, it should be possible to demonstrate that this capacity has some functional significance. Here we show how this sensory input can profoundly influence an animal's ability to locate the source of odor cues in realistic turbulent environments-a common task faced by species that rely on olfactory cues for navigation. Using detailed data from a turbulent plume created in the laboratory, we reconstruct the spatiotemporal behavior of a real odor field. We use recurrence theory to show that information about position relative to the source of the odor plume is embedded in the timing between odor pulses. Then, using a parameterized computational model, we show how an animal can use populations of rhythmically active neurons to capture and encode this temporal information in real time, and use it to efficiently navigate to an odor source. Our results demonstrate that the capacity to accurately encode temporal information about sensory cues may be crucial for efficient olfactory navigation. More generally, our results suggest a mechanism for extracting and encoding temporal information from the sensory environment that could have broad utility for neural information processing.
Asunto(s)
Conducta Apetitiva/fisiología , Modelos Neurológicos , Odorantes/análisis , Neuronas Receptoras Olfatorias/fisiología , Olfato/fisiología , Animales , Biología ComputacionalRESUMEN
BACKGROUND: Biomarkers derived from neural activity of the brain present a vital tool for the prediction and evaluation of post-stroke motor recovery, as well as for real-time biofeedback opportunities. METHODS: In order to encapsulate recovery-related reorganization of brain networks into such biomarkers, we have utilized the generalized measure of association (GMA) and graph analyses, which include global and local efficiency, as well as hemispheric interdensity and intradensity. These methods were applied to electroencephalogram (EEG) data recorded during a study of 30 stroke survivors (21 male, mean age 57.9 years, mean stroke duration 22.4 months) undergoing 12 weeks of intensive therapeutic intervention. RESULTS: We observed that decreases of the intradensity of the unaffected hemisphere are correlated (r s =-0.46;p<0.05) with functional recovery, as measured by the upper-extremity portion of the Fugl-Meyer Assessment (FMUE). In addition, high initial values of local efficiency predict greater improvement in FMUE (R 2=0.16;p<0.05). In a subset of 17 subjects possessing lesions of the cerebral cortex, reductions of global and local efficiency, as well as the intradensity of the unaffected hemisphere are found to be associated with functional improvement (r s =-0.60,-0.66,-0.75;p<0.05). Within the same subgroup, high initial values of global and local efficiency, are predictive of improved recovery (R 2=0.24,0.25;p<0.05). All significant findings were specific to the 12.5-25 Hz band. CONCLUSIONS: These topological measures show promise for prognosis and evaluation of therapeutic outcomes, as well as potential application to BCI-enabled biofeedback.
Asunto(s)
Red Nerviosa/patología , Rehabilitación de Accidente Cerebrovascular , Accidente Cerebrovascular/patología , Accidente Cerebrovascular/psicología , Adulto , Anciano , Algoritmos , Biorretroalimentación Psicológica , Biomarcadores , Fenómenos Biomecánicos , Corteza Cerebral/patología , Electroencefalografía , Femenino , Lateralidad Funcional , Humanos , Masculino , Persona de Mediana Edad , Valor Predictivo de las Pruebas , Pronóstico , Recuperación de la Función , Sobrevivientes , Resultado del Tratamiento , Adulto JovenRESUMEN
The spatial and temporal characteristics of the visual and acoustic sensory input are indispensable attributes for animals to perform scene analysis. In contrast, research in olfaction has focused almost exclusively on how the nervous system analyzes the quality and quantity of the sensory signal and largely ignored the spatiotemporal dimension especially in longer time scales. Yet, detailed analyses of the turbulent, intermittent structure of water- and air-borne odor plumes strongly suggest that spatio-temporal information in longer time scales can provide major cues for olfactory scene analysis for animals. We show that a bursting subset of primary olfactory receptor neurons (bORNs) in lobster has the unexpected capacity to encode the temporal properties of intermittent odor signals. Each bORN is tuned to a specific range of stimulus intervals, and collectively bORNs can instantaneously encode a wide spectrum of intermittencies. Our theory argues for the existence of a novel peripheral mechanism for encoding the temporal pattern of odor that potentially serves as a neural substrate for olfactory scene analysis.
Asunto(s)
Odorantes , Vías Olfatorias/fisiología , Neuronas Receptoras Olfatorias/fisiología , Olfato/fisiología , Animales , Femenino , Masculino , Nephropidae , Especificidad por SustratoRESUMEN
In studies of the nervous system, the choice of metric for the neural responses is a pivotal assumption. For instance, a well-suited distance metric enables us to gauge the similarity of neural responses to various stimuli and assess the variability of responses to a repeated stimulus-exploratory steps in understanding how the stimuli are encoded neurally. Here we introduce an approach where the metric is tuned for a particular neural decoding task. Neural spike train metrics have been used to quantify the information content carried by the timing of action potentials. While a number of metrics for individual neurons exist, a method to optimally combine single-neuron metrics into multineuron, or population-based, metrics is lacking. We pose the problem of optimizing multineuron metrics and other metrics using centered alignment, a kernel-based dependence measure. The approach is demonstrated on invasively recorded neural data consisting of both spike trains and local field potentials. The experimental paradigm consists of decoding the location of tactile stimulation on the forepaws of anesthetized rats. We show that the optimized metrics highlight the distinguishing dimensions of the neural response, significantly increase the decoding accuracy, and improve nonlinear dimensionality reduction methods for exploratory neural analysis.
Asunto(s)
Potenciales de Acción/fisiología , Algoritmos , Aprendizaje , Modelos Neurológicos , Neuronas/fisiología , Animales , Biometría , Simulación por Computador , Humanos , Red Nerviosa/fisiología , Estimulación Física , Ratas , Programas InformáticosRESUMEN
Predicting the trajectory of pedestrians in crowd scenarios is indispensable in self-driving or autonomous mobile robot field because estimating the future locations of pedestrians around is beneficial for policy decision to avoid collision. It is a challenging issue because humans have different walking motions, and the interactions between humans and objects in the current environment, especially between humans themselves, are complex. Previous researchers focused on how to model human-human interactions but neglected the relative importance of interactions. To address this issue, a novel mechanism based on correntropy is introduced. The proposed mechanism not only can measure the relative importance of human-human interactions but also can build personal space for each pedestrian. An interaction module, including this data-driven mechanism, is further proposed. In the proposed module, the data-driven mechanism can effectively extract the feature representations of dynamic human-human interactions in the scene and calculate the corresponding weights to represent the importance of different interactions. To share such social messages among pedestrians, an interaction-aware architecture based on long short-term memory network for trajectory prediction is designed. Experiments are conducted on two public datasets. Experimental results demonstrate that our model can achieve better performance than several latest methods with good performance.
RESUMEN
Continuous-time asynchronous data converters namely, analog-to-digital converters and analog-to-time converters, can be beneficial for certain types of applications, such as, processing of biological signals with sparse information. A particular case of these converters is the integrate-and-fire converter (IFC) that is inspired by the neural system. If it is possible to develop a standard-cell-based (SCB) IFC circuit to perform well in advanced technology nodes, it will benefit from the simplicity of SCB circuit designs and can be implemented in widely available field-programmable gate arrays (FPGAs). This way, this paper proposes two IFC circuits designed and prototyped in a 130 nm CMOS standard process. The first is a novel SCB open-loop dynamic IFC. The latter, is a closed-loop analog IFC with conventional blocks. This paper presents a through comparison between the two IFC circuits. They have a power dissipation of 59 µW and 53 µW, and an energy per pulse of 18 pJ and 1060 pJ, SCB and analog IFC, respectively. The SCB IFC has one of the lowest energy per pulse consumption reported for IFC circuits. The analog IFC, being fully differential, is to our knowledge the first of its kind. Moreover, they do not require an external clock. They can convert signals with a peak-to-peak amplitude from 1.6 mV to 28 mV and 0.6 mV to 2.4 mV, and a frequency range of 2 Hz to 42 kHz and 10 Hz to 4 kHz, SCB and analog IFC, respectively. Presenting low normalized RMS conversion plus reconstruction errors, below 5.2%. The maximum pulse density (average firing-rate) is 3300 kHz, for the SCB and 50 kHz, for the analog IFC.
Asunto(s)
Procesamiento de Señales Asistido por Computador , Procesamiento de Señales Asistido por Computador/instrumentación , Conversión Analogo-Digital , Diseño de Equipo , Humanos , Neuronas/fisiologíaRESUMEN
Deep-predictive-coding networks (DPCNs) are hierarchical, generative models. They rely on feed-forward and feedback connections to modulate latent feature representations of stimuli in a dynamic and context-sensitive manner. A crucial element of DPCNs is a forward-backward inference procedure to uncover sparse, invariant features. However, this inference is a major computational bottleneck. It severely limits the network depth due to learning stagnation. Here, we prove why this bottleneck occurs. We then propose a new forward-inference strategy based on accelerated proximal gradients. This strategy has faster theoretical convergence guarantees than the one used for DPCNs. It overcomes learning stagnation. We also demonstrate that it permits constructing deep and wide predictive-coding networks. Such convolutional networks implement receptive fields that capture well the entire classes of objects on which the networks are trained. This improves the feature representations compared with our lab's previous nonconvolutional and convolutional DPCNs. It yields unsupervised object recognition that surpass convolutional autoencoders and is on par with convolutional networks trained in a supervised manner.
RESUMEN
Distinct dynamics in different cortical layers are apparent in neuronal and local field potential (LFP) patterns, yet their associations in the context of laminar processing have been sparingly analyzed. Here, we study the laminar organization of spike-field causal flow within and across visual (V4) and frontal areas (PFC) of monkeys performing a visual task. Using an event-based quantification of LFPs and a directed information estimator, we found area and frequency specificity in the laminar organization of spike-field causal connectivity. Gamma bursts (40-80 Hz) in the superficial layers of V4 largely drove intralaminar spiking. These gamma influences also fed forward up the cortical hierarchy to modulate laminar spiking in PFC. In PFC, the direction of intralaminar information flow was from spikes â fields where these influences dually controlled top-down and bottom-up processing. Our results, enabled by innovative methodologies, emphasize the complexities of spike-field causal interactions amongst multiple brain areas and behavior.
RESUMEN
Estimating conditional dependence between two random variables given the knowledge of a third random variable is essential in neuroscientific applications to understand the causal architecture of a distributed network. However, existing methods of assessing conditional dependence, such as the conditional mutual information, are computationally expensive, involve free parameters, and are difficult to understand in the context of realizations. In this letter, we discuss a novel approach to this problem and develop a computationally simple and parameter-free estimator. The difference between the proposed approach and the existing ones is that the former expresses conditional dependence in terms of a finite set of realizations, whereas the latter use random variables, which are not available in practice. We call this approach conditional association, since it is based on a generalization of the concept of association to arbitrary metric spaces. We also discuss a novel and computationally efficient approach of generating surrogate data for evaluating the significance of the acquired association value.
Asunto(s)
Algoritmos , Simulación por Computador , Frecuencia Cardíaca/fisiología , Síndromes de la Apnea del Sueño/fisiopatología , Humanos , RespiraciónRESUMEN
Exploratory tools that are sensitive to arbitrary statistical variations in spike train observations open up the possibility of novel neuroscientific discoveries. Developing such tools, however, is difficult due to the lack of Euclidean structure of the spike train space, and an experimenter usually prefers simpler tools that capture only limited statistical features of the spike train, such as mean spike count or mean firing rate. We explore strictly positive-definite kernels on the space of spike trains to offer both a structural representation of this space and a platform for developing statistical measures that explore features beyond count or rate. We apply these kernels to construct measures of divergence between two point processes and use them for hypothesis testing, that is, to observe if two sets of spike trains originate from the same underlying probability law. Although there exist positive-definite spike train kernels in the literature, we establish that these kernels are not strictly definite and thus do not induce measures of divergence. We discuss the properties of both of these existing nonstrict kernels and the novel strict kernels in terms of their computational complexity, choice of free parameters, and performance on both synthetic and real data through kernel principal component analysis and hypothesis testing.
Asunto(s)
Potenciales de Acción/fisiología , Modelos Neurológicos , Neuronas/fisiología , Reconocimiento Visual de Modelos/fisiología , Simulación por Computador , Probabilidad , Procesamiento de Señales Asistido por Computador , Factores de TiempoRESUMEN
Inspired by the human vision system and learning, we propose a novel cognitive architecture that understands the content of raw videos in terms of objects without using labels. The architecture achieves four objectives: (1) Decomposing raw frames in objects by exploiting foveal vision and memory. (2) Describing the world by projecting objects on an internal canvas. (3) Extracting relevant objects from the canvas by analyzing the causal relation between objects and rewards. (4) Exploiting the information of relevant objects to facilitate the reinforcement learning (RL) process. In order to speed up learning, and better identify objects that produce rewards, the architecture implements learning by causality from the perspective of Wiener and Granger using object trajectories stored in working memory and the time series of external rewards. A novel non-parametric estimator of directed information using Renyi's entropy is designed and tested. Experiments on three environments show that our architecture extracts most of relevant objects. It can be thought of as 'understanding' the world in an object-oriented way. As a consequence, our architecture outperforms state-of-the-art deep reinforcement learning in terms of training speed and transfer learning.
Asunto(s)
Aprendizaje , Recompensa , Causalidad , Cognición , Humanos , Refuerzo en PsicologíaRESUMEN
By redefining the conventional notions of layers, we present an alternative view on finitely wide, fully trainable deep neural networks as stacked linear models in feature spaces, leading to a kernel machine interpretation. Based on this construction, we then propose a provably optimal modular learning framework for classification that does not require between-module backpropagation. This modular approach brings new insights into the label requirement of deep learning (DL). It leverages only implicit pairwise labels (weak supervision) when learning the hidden modules. When training the output module, on the other hand, it requires full supervision but achieves high label efficiency, needing as few as ten randomly selected labeled examples (one from each class) to achieve 94.88% accuracy on CIFAR-10 using a ResNet-18 backbone. Moreover, modular training enables fully modularized DL workflows, which then simplify the design and implementation of pipelines and improve the maintainability and reusability of models. To showcase the advantages of such a modularized workflow, we describe a simple yet reliable method for estimating reusability of pretrained modules as well as task transferability in a transfer learning setting. At practically no computation overhead, it precisely described the task space structure of 15 binary classification tasks from CIFAR-10.
Asunto(s)
Aprendizaje Profundo , Redes Neurales de la ComputaciónRESUMEN
Objective.Brain-machine interfaces (BMIs) translate neural activity into motor commands to restore motor functions for people with paralysis. Local field potentials (LFPs) are promising for long-term BMIs, since the quality of the recording lasts longer than single neuronal spikes. Inferring neuronal spike activity from population activities such as LFPs is challenging, because LFPs stem from synaptic currents flowing in the neural tissue produced by various neuronal ensembles and reflect neural synchronization. Existing studies that combine LFPs with spikes leverage the spectrogram of the former, which can neither detect the transient characteristics of LFP features (here, neuromodulation in a specific frequency band) with high accuracy, nor correlate them with relevant neuronal activity with a sufficient time resolution.Approach.We propose a feature extraction and validation framework to directly extract LFP neuromodulations related to synchronized spike activity using recordings from the primary motor cortex of six Sprague Dawley rats during a lever-press task. We first select important LFP frequency bands relevant to behavior, and then implement a marked point process (MPP) methodology to extract transient LFP neuromodulations. We validate the LFP feature extraction by examining the correlation with the pairwise synchronized firing probability of important neurons, which are selected according to their contribution to behavioral decoding. The highly correlated synchronized firings identified by the LFP neuromodulations are fed into a decoder to check whether they can serve as a reliable neural data source for movement decoding.Main results.We find that the gamma band (30-80 Hz) LFP neuromodulations demonstrate significant correlation with synchronized firings. Compared with traditional spectrogram-based method, the higher-temporal resolution MPP method captures the synchronized firing patterns with fewer false alarms, and demonstrates significantly higher correlation than single neuron spikes. The decoding performance using the synchronized neuronal firings identified by the LFP neuromodulations can reach 90% compared to the full recorded neuronal ensembles.Significance.Our proposed framework successfully extracts the sparse LFP neuromodulations that can identify temporal synchronized neuronal spikes with high correlation. The identified neuronal spike pattern demonstrates high decoding performance, which suggest LFP can be used as an effective modality for long-term BMI decoding.