Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 71
Filtrar
1.
Neural Netw ; 160: 148-163, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36640490

RESUMO

A biological neural network in the cortex forms a neural field. Neurons in the field have their own receptive fields, and connection weights between two neurons are random but highly correlated when they are in close proximity in receptive fields. In this paper, we investigate such neural fields in a multilayer architecture to investigate the supervised learning of the fields. We empirically compare the performances of our field model with those of randomly connected deep networks. The behavior of a randomly connected network is investigated on the basis of the key idea of the neural tangent kernel regime, a recent development in the machine learning theory of over-parameterized networks; for most randomly connected neural networks, it is shown that global minima always exist in their small neighborhoods. We numerically show that this claim also holds for our neural fields. In more detail, our model has two structures: (i) each neuron in a field has a continuously distributed receptive field, and (ii) the initial connection weights are random but not independent, having correlations when the positions of neurons are close in each layer. We show that such a multilayer neural field is more robust than conventional models when input patterns are deformed by noise disturbances. Moreover, its generalization ability can be slightly superior to that of conventional models.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Neurônios/fisiologia , Aprendizado de Máquina
2.
Neural Comput ; 33(8): 2274-2307, 2021 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-34310678

RESUMO

The Fisher information matrix (FIM) plays an essential role in statistics and machine learning as a Riemannian metric tensor or a component of the Hessian matrix of loss functions. Focusing on the FIM and its variants in deep neural networks (DNNs), we reveal their characteristic scale dependence on the network width, depth, and sample size when the network has random weights and is sufficiently wide. This study covers two widely used FIMs for regression with linear output and for classification with softmax output. Both FIMs asymptotically show pathological eigenvalue spectra in the sense that a small number of eigenvalues become large outliers depending on the width or sample size, while the others are much smaller. It implies that the local shape of the parameter space or loss landscape is very sharp in a few specific directions while almost flat in the other directions. In particular, the softmax output disperses the outliers and makes a tail of the eigenvalue density spread from the bulk. We also show that pathological spectra appear in other variants of FIMs: one is the neural tangent kernel; another is a metric for the input signal and feature space that arises from feedforward signal propagation. Thus, we provide a unified perspective on the FIM and its variants that will lead to more quantitative understanding of learning in large-scale DNNs.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação
3.
Neural Comput ; 32(8): 1431-1447, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32521215

RESUMO

It is known that any target function is realized in a sufficiently small neighborhood of any randomly connected deep network, provided the width (the number of neurons in a layer) is sufficiently large. There are sophisticated analytical theories and discussions concerning this striking fact, but rigorous theories are very complicated. We give an elementary geometrical proof by using a simple model for the purpose of elucidating its structure. We show that high-dimensional geometry plays a magical role. When we project a high-dimensional sphere of radius 1 to a low-dimensional subspace, the uniform distribution over the sphere shrinks to a gaussian distribution with negligibly small variances and covariances.

4.
Brain Nerve ; 71(12): 1349-1355, 2019 Dec.
Artigo em Japonês | MEDLINE | ID: mdl-31787624

RESUMO

The brain is an information machine equipped with mind and consciousness, acquired through a long history of evolution. Artificial intelligence (AI) aims at the development of intelligent functions in computers. We show mechanisms of deep learning in AI, and compare them with brain functions. In particular, we consider the function of consciousness in the brain, and its relation to AI. Mathematical neuroscience provides a powerful method to both AI and brain science. We conclude by emphasizing that further interactions are important for both AI and brain science.


Assuntos
Inteligência Artificial , Encéfalo/fisiologia , Estado de Consciência , Humanos , Neurociências
5.
Neural Comput ; 31(5): 827-848, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30883281

RESUMO

We propose a new divergence on the manifold of probability distributions, building on the entropic regularization of optimal transportation problems. As Cuturi ( 2013 ) showed, regularizing the optimal transport problem with an entropic term is known to bring several computational benefits. However, because of that regularization, the resulting approximation of the optimal transport cost does not define a proper distance or divergence between probability distributions. We recently tried to introduce a family of divergences connecting the Wasserstein distance and the Kullback-Leibler divergence from an information geometry point of view (see Amari, Karakida, & Oizumi, 2018 ). However, that proposal was not able to retain key intuitive aspects of the Wasserstein geometry, such as translation invariance, which plays a key role when used in the more general problem of computing optimal transport barycenters. The divergence we propose in this work is able to retain such properties and admits an intuitive interpretation.

6.
IEEE Trans Neural Netw Learn Syst ; 29(11): 5242-5248, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-29994374

RESUMO

Neurons in a network can be both active or inactive. Given a subset of neurons in a network, is it possible for the subset of neurons to evolve to form an active oscillator by applying some external periodic stimulus? Furthermore, can these oscillator neurons be observable, that is, is it a stable oscillator? This paper explores such possibility, finding that an important property: any subset of neurons can be intermittently co-activated to form a stable oscillator by applying some external periodic input without any condition. Thus, the existing of intermittently active oscillator neurons is an essential property possessed by the networks. Moreover, this paper shows that, under some conditions, a subset of neurons can be fully co-activated to form a stable oscillator. Such neurons are called selectable oscillator neurons. Necessary and sufficient conditions are established for a subset of neurons to be selectable oscillator neurons in linear threshold recurrent neuron networks. It is proved that a subset of neurons forms selectable oscillator neurons if and only if the real part of each eigenvalue of the associated synaptic connection weight submatrix of the network is not larger than one. This simple condition makes the concept of selectable oscillator neurons tractable. The selectable oscillator neurons can be regarded as memories stored in the synaptic connections of networks, which enables to find a new perspective of memories in neural networks, different from the equilibrium-type attractors.

7.
Annu Rev Stat Appl ; 5: 183-214, 2018 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30976604

RESUMO

Mathematical and statistical models have played important roles in neuroscience, especially by describing the electrical activity of neurons recorded individually, or collectively across large networks. As the field moves forward rapidly, new challenges are emerging. For maximal effectiveness, those working to advance computational neuroscience will need to appreciate and exploit the complementary strengths of mechanistic theory and the statistical paradigm.

8.
Neural Comput ; 30(1): 1-33, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-29064781

RESUMO

The dynamics of supervised learning play a main role in deep learning, which takes place in the parameter space of a multilayer perceptron (MLP). We review the history of supervised stochastic gradient learning, focusing on its singular structure and natural gradient. The parameter space includes singular regions in which parameters are not identifiable. One of our results is a full exploration of the dynamical behaviors of stochastic gradient learning in an elementary singular network. The bad news is its pathological nature, in which part of the singular region becomes an attractor and another part a repulser at the same time, forming a Milnor attractor. A learning trajectory is attracted by the attractor region, staying in it for a long time, before it escapes the singular region through the repulser region. This is typical of plateau phenomena in learning. We demonstrate the strange topology of a singular region by introducing blow-down coordinates, which are useful for analyzing the natural gradient dynamics. We confirm that the natural gradient dynamics are free of critical slowdown. The second main result is the good news: the interactions of elementary singular networks eliminate the attractor part and the Milnor-type attractors disappear. This explains why large-scale networks do not suffer from serious critical slowdowns due to singularities. We finally show that the unit-wise natural gradient is effective for learning in spite of its low computational cost.

9.
Proc Natl Acad Sci U S A ; 113(51): 14817-14822, 2016 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-27930289

RESUMO

Assessment of causal influences is a ubiquitous and important subject across diverse research fields. Drawn from consciousness studies, integrated information is a measure that defines integration as the degree of causal influences among elements. Whereas pairwise causal influences between elements can be quantified with existing methods, quantifying multiple influences among many elements poses two major mathematical difficulties. First, overestimation occurs due to interdependence among influences if each influence is separately quantified in a part-based manner and then simply summed over. Second, it is difficult to isolate causal influences while avoiding noncausal confounding influences. To resolve these difficulties, we propose a theoretical framework based on information geometry for the quantification of multiple causal influences with a holistic approach. We derive a measure of integrated information, which is geometrically interpreted as the divergence between the actual probability distribution of a system and an approximated probability distribution where causal influences among elements are statistically disconnected. This framework provides intuitive geometric interpretations harmonizing various information theoretic measures in a unified manner, including mutual information, transfer entropy, stochastic interaction, and integrated information, each of which is characterized by how causal influences are disconnected. In addition to the mathematical assessment of consciousness, our framework should help to analyze causal relationships in complex systems in a complete and hierarchical manner.

10.
Neural Netw ; 79: 78-87, 2016 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-27131468

RESUMO

The restricted Boltzmann machine (RBM) is an essential constituent of deep learning, but it is hard to train by using maximum likelihood (ML) learning, which minimizes the Kullback-Leibler (KL) divergence. Instead, contrastive divergence (CD) learning has been developed as an approximation of ML learning and widely used in practice. To clarify the performance of CD learning, in this paper, we analytically derive the fixed points where ML and CDn learning rules converge in two types of RBMs: one with Gaussian visible and Gaussian hidden units and the other with Gaussian visible and Bernoulli hidden units. In addition, we analyze the stability of the fixed points. As a result, we find that the stable points of CDn learning rule coincide with those of ML learning rule in a Gaussian-Gaussian RBM. We also reveal that larger principal components of the input data are extracted at the stable points. Moreover, in a Gaussian-Bernoulli RBM, we find that both ML and CDn learning can extract independent components at one of stable points. Our analysis demonstrates that the same feature components as those extracted by ML learning are extracted simply by performing CD1 learning. Expanding this study should elucidate the specific solutions obtained by CD learning in other types of RBMs or in deep networks.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Distribuição Normal , Algoritmos , Aprendizagem , Aprendizagem por Probabilidade
11.
PLoS Comput Biol ; 12(1): e1004654, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26796119

RESUMO

Accumulating evidence indicates that the capacity to integrate information in the brain is a prerequisite for consciousness. Integrated Information Theory (IIT) of consciousness provides a mathematical approach to quantifying the information integrated in a system, called integrated information, Φ. Integrated information is defined theoretically as the amount of information a system generates as a whole, above and beyond the amount of information its parts independently generate. IIT predicts that the amount of integrated information in the brain should reflect levels of consciousness. Empirical evaluation of this theory requires computing integrated information from neural data acquired from experiments, although difficulties with using the original measure Φ precludes such computations. Although some practical measures have been previously proposed, we found that these measures fail to satisfy the theoretical requirements as a measure of integrated information. Measures of integrated information should satisfy the lower and upper bounds as follows: The lower bound of integrated information should be 0 and is equal to 0 when the system does not generate information (no information) or when the system comprises independent parts (no integration). The upper bound of integrated information is the amount of information generated by the whole system. Here we derive the novel practical measure Φ* by introducing a concept of mismatched decoding developed from information theory. We show that Φ* is properly bounded from below and above, as required, as a measure of integrated information. We derive the analytical expression of Φ* under the Gaussian assumption, which makes it readily applicable to experimental data. Our novel measure Φ* can generally be used as a measure of integrated information in research on consciousness, and also as a tool for network analysis on diverse areas of biology.


Assuntos
Estado de Consciência/fisiologia , Teoria da Informação , Modelos Neurológicos , Animais , Córtex Cerebral/fisiologia , Biologia Computacional , Eletrocorticografia , Macaca , Distribuição Normal
12.
IEEE Trans Neural Netw Learn Syst ; 27(4): 736-48, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26068876

RESUMO

We propose a generative model for robust tensor factorization in the presence of both missing data and outliers. The objective is to explicitly infer the underlying low-CANDECOMP/PARAFAC (CP)-rank tensor capturing the global information and a sparse tensor capturing the local information (also considered as outliers), thus providing the robust predictive distribution over missing entries. The low-CP-rank tensor is modeled by multilinear interactions between multiple latent factors on which the column sparsity is enforced by a hierarchical prior, while the sparse tensor is modeled by a hierarchical view of Student-t distribution that associates an individual hyperparameter with each element independently. For model learning, we develop an efficient variational inference under a fully Bayesian treatment, which can effectively prevent the overfitting problem and scales linearly with data size. In contrast to existing related works, our method can perform model selection automatically and implicitly without the need of tuning parameters. More specifically, it can discover the groundtruth of CP rank and automatically adapt the sparsity inducing priors to various types of outliers. In addition, the tradeoff between the low-rank approximation and the sparse representation can be optimized in the sense of maximum model evidence. The extensive experiments and comparisons with many state-of-the-art algorithms on both synthetic and real-world data sets demonstrate the superiorities of our method from several perspectives.

13.
Vision Res ; 120: 61-73, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26278166

RESUMO

Natural scenes contain richer perceptual information in their spatial phase structure than their amplitudes. Modeling phase structure of natural scenes may explain higher-order structure inherent to the natural scenes, which is neglected in most classical models of redundancy reduction. Only recently, a few models have represented images using a complex form of receptive fields (RFs) and analyze their complex responses in terms of amplitude and phase. However, these complex representation models often tacitly assume a uniform phase distribution without empirical support. The structure of spatial phase distributions of natural scenes in the form of relative contributions of paired responses of RFs in quadrature has not been explored statistically until now. Here, we investigate the spatial phase structure of natural scenes using complex forms of various Gabor-like RFs. To analyze distributions of the spatial phase responses, we constructed a mixture model that accounts for multi-modal circular distributions, and the EM algorithm for estimation of the model parameters. Based on the likelihood, we report presence of both uniform and structured bimodal phase distributions in natural scenes. The latter bimodal distributions were symmetric with two peaks separated by about 180°. Thus, the redundancy in the natural scenes can be further removed by using the bimodal phase distributions obtained from these RFs in the complex representation models. These results predict that both phase invariant and phase sensitive complex cells are required to represent the regularities of natural scenes in visual systems.


Assuntos
Modelos Estatísticos , Córtex Visual/fisiologia , Percepção Visual/fisiologia , Humanos , Processamento Espacial
14.
Artigo em Inglês | MEDLINE | ID: mdl-25871186

RESUMO

In a manner similar to the molecular chaos that underlies the stable thermodynamics of gases, a neuronal system may exhibit microscopic instability in individual neuronal dynamics while a macroscopic order of the entire population possibly remains stable. In this study, we analyze the microscopic stability of a network of neurons whose macroscopic activity obeys stable dynamics, expressing either monostable, bistable, or periodic state. We reveal that the network exhibits a variety of dynamical states for microscopic instability residing in a given stable macroscopic dynamics. The presence of a variety of dynamical states in such a simple random network implies more abundant microscopic fluctuations in real neural networks which consist of more complex and hierarchically structured interactions.


Assuntos
Modelos Neurológicos , Rede Nervosa/citologia , Neurônios , Dinâmica não Linear
15.
Artigo em Inglês | MEDLINE | ID: mdl-23496575

RESUMO

We study the dynamics of randomly connected networks composed of binary Boolean elements and those composed of binary majority vote elements. We elucidate their differences in both sparsely and densely connected cases. The quickness of large network dynamics is usually quantified by the length of transient paths, an analytically intractable measure. For discrete-time dynamics of networks of binary elements, we address this dilemma with an alternative unified framework by using a concept termed state concentration, defined as the exponent of the average number of t-step ancestors in state transition graphs. The state transition graph is defined by nodes corresponding to network states and directed links corresponding to transitions. Using this exponent, we interrogate the dynamics of random Boolean and majority vote networks. We find that extremely sparse Boolean networks and majority vote networks with arbitrary density achieve quickness, owing in part to long-tailed in-degree distributions. As a corollary, only relatively dense majority vote networks can achieve both quickness and robustness.


Assuntos
Algoritmos , Modelos Estatísticos , Simulação por Computador
16.
Neural Netw ; 37: 48-51, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23092761

RESUMO

Theoreticians have been enchanted by the secrets of the brain for many years: how and why does it work so well? There has been a long history of searching for its mechanisms. Theoretical or even mathematical scientists have proposed various models of neural networks which has led to the birth of a new field of research. We can think of the 'pre-historic' period of Rashevski and Wiener, and then the period of perceptrons which is the beginning of learning machines, neurodynamics approaches, and further connectionist approaches. Now is currently the period of computational neuroscience. I have been working in this field for nearly half a century, and have experienced its repeated rise and fall. Now having reached very old age, I would like to state my own endeavors on establishing mathematical neuroscience for half a century, from a personal, even biased, point of view. It would be my pleasure if my experiences could encourage young researchers to participate in mathematical neuroscience.


Assuntos
Inteligência Artificial/história , Matemática/história , Modelos Neurológicos , Neurociências/história , Animais , História do Século XX , História do Século XXI , Humanos
17.
Neural Comput ; 24(12): 3191-212, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-22970868

RESUMO

We study the Bayesian process to estimate the features of the environment. We focus on two aspects of the Bayesian process: how estimation error depends on the prior distribution of features and how the prior distribution can be learned from experience. The accuracy of the perception is underestimated when each feature of the environment is considered independently because many different features of the environment are usually highly correlated and the estimation error greatly depends on the correlations. The self-consistent learning process renews the prior distribution of correlated features jointly with the estimation of the environment. Here, maximum a posteriori probability (MAP) estimation decreases the effective dimensions of the feature vector. There are critical noise levels in self-consistent learning with MAP estimation, that cause hysteresis behaviors in learning. The self-consistent learning process with stochastic Bayesian estimation (SBE) makes the presumed distribution of environmental features converge to the true distribution for any level of channel noise. However, SBE is less accurate than MAP estimation. We also discuss another stochastic method of estimation, SBE2, which has a smaller estimation error than SBE without hysteresis.


Assuntos
Aprendizagem/fisiologia , Percepção/fisiologia , Animais , Teorema de Bayes , Humanos
18.
Cogn Neurodyn ; 6(2): 169-83, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22511913

RESUMO

The neural representation of motion aftereffects induced by various visual flows (translational, rotational, motion-in-depth, and translational transparent flows) was studied under the hypothesis that the imbalances in discharge activities would occur in favor in the direction opposite to the adapting stimulation in the monkey MST cells (cells in the medial superior temporal area) which can discriminate the mode (i.e., translational, rotational, or motion-in-depth) of the given flow. In single-unit recording experiments conducted on anaesthetized monkeys, we found that the rate of spontaneous discharge and the sensitivity to a test stimulus moving in the preferred direction decreased after receiving an adapting stimulation moving in the preferred direction, whereas they increased after receiving an adapting stimulation moving in the null direction. To consistently explain the bidirectional perception of a transparent visual flow and its unidirectional motion aftereffect by the same hypothesis, we need to assume the existence of two subtypes of MST D cells which show directionally selective responses to a translational flow: component cells and integration cells. Our physiological investigation revealed that the MST D cells could be divided into two types: one responded to a transparent flow by two peaks at the instances when the direction of one of the component flow matched the preferred direction of the cell, and the other responded by a single peak at the instance when the direction of the integrated motion matched the preferred direction. In psychophysical experiments on human subjects, we found evidence for the existence of component and integration representations in the human brain. To explain the different motion perceptions, i.e., two transparent flows during presentation of the flows and a single flow in the opposite direction to the integrated flows after stopping the flow stimuli, we suggest that the pattern-discrimination system can select the motion representation that is consistent with the perception of the pattern from two motion representations. We discuss the computational aspects related to the integration of component motion fields.

19.
Neural Comput ; 24(7): 1722-39, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22428593

RESUMO

Detecting and characterizing causal interdependencies and couplings between different activated brain areas from functional neuroimage time series measurements of their activity constitutes a significant step toward understanding the process of brain functions. In this letter, we make the simple point that all current statistics used to make inferences about directed influences in functional neuroimage time series are variants of the same underlying quantity. This includes directed transfer entropy, transinformation, Kullback-Leibler formulations, conditional mutual information, and Granger causality. Crucially, in the case of autoregressive modeling, the underlying quantity is the likelihood ratio that compares models with and without directed influences from the past when modeling the influence of one time series on another. This framework is also used to derive the relation between these measures of directed influence and the complexity or the order of directed influence. These results provide a framework for unifying the Kullback-Leibler divergence, Granger causality, and the complexity of directed influence.


Assuntos
Algoritmos , Mapeamento Encefálico/métodos , Encéfalo/fisiologia , Modelos Teóricos , Animais , Humanos
20.
PLoS Comput Biol ; 8(3): e1002385, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22412358

RESUMO

Precise spike coordination between the spiking activities of multiple neurons is suggested as an indication of coordinated network activity in active cell assemblies. Spike correlation analysis aims to identify such cooperative network activity by detecting excess spike synchrony in simultaneously recorded multiple neural spike sequences. Cooperative activity is expected to organize dynamically during behavior and cognition; therefore currently available analysis techniques must be extended to enable the estimation of multiple time-varying spike interactions between neurons simultaneously. In particular, new methods must take advantage of the simultaneous observations of multiple neurons by addressing their higher-order dependencies, which cannot be revealed by pairwise analyses alone. In this paper, we develop a method for estimating time-varying spike interactions by means of a state-space analysis. Discretized parallel spike sequences are modeled as multi-variate binary processes using a log-linear model that provides a well-defined measure of higher-order spike correlation in an information geometry framework. We construct a recursive Bayesian filter/smoother for the extraction of spike interaction parameters. This method can simultaneously estimate the dynamic pairwise spike interactions of multiple single neurons, thereby extending the Ising/spin-glass model analysis of multiple neural spike train data to a nonstationary analysis. Furthermore, the method can estimate dynamic higher-order spike interactions. To validate the inclusion of the higher-order terms in the model, we construct an approximation method to assess the goodness-of-fit to spike data. In addition, we formulate a test method for the presence of higher-order spike correlation even in nonstationary spike data, e.g., data from awake behaving animals. The utility of the proposed methods is tested using simulated spike data with known underlying correlation dynamics. Finally, we apply the methods to neural spike data simultaneously recorded from the motor cortex of an awake monkey and demonstrate that the higher-order spike correlation organizes dynamically in relation to a behavioral demand.


Assuntos
Potenciais de Ação/fisiologia , Algoritmos , Modelos Neurológicos , Córtex Motor/fisiologia , Movimento/fisiologia , Rede Nervosa/fisiologia , Neurônios/fisiologia , Animais , Simulação por Computador , Eletroencefalografia/métodos , Haplorrinos , Estatística como Assunto , Análise e Desempenho de Tarefas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...