Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
Phys Rev E ; 108(2-1): 024310, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37723812

ABSTRACT

We study the binary and continuous negative-margin perceptrons as simple nonconvex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These minimizers coexist with a background of dominant solutions which are composed by an exponential number of algorithmically inaccessible small clusters for the binary case (the frozen 1-RSB phase) or a hierarchical structure of clusters of different sizes for the spherical case (the full RSB phase). In both cases, when a certain threshold in constraint density is crossed, the local entropy of the wide flat minima becomes nonmonotonic, indicating a breakup of the space of robust solutions into disconnected components. This has a strong impact on the behavior of algorithms in binary models, which cannot access the remaining isolated clusters. For the spherical case the behavior is different, since even beyond the disappearance of the wide flat minima the remaining solutions are shown to always be surrounded by a large number of other solutions at any distance, up to capacity. Indeed, we exhibit numerical evidence that algorithms seem to find solutions up to the SAT/UNSAT transition, that we compute here using an 1RSB approximation. For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers even when trained in the highly underconstrained regime of very negative margins.

2.
Phys Rev E ; 106(1-1): 014116, 2022 Jul.
Article in English | MEDLINE | ID: mdl-35974501

ABSTRACT

Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that defy predictions of statistical learning and pose conceptual challenges for nonconvex optimization. In this paper, we use methods from statistical physics of disordered systems to analytically study the computational fallout of overparameterization in nonconvex binary neural network models, trained on data generated from a structurally simpler but "hidden" network. As the number of connection weights increases, we follow the changes of the geometrical structure of different minima of the error loss function and relate them to learning and generalization performance. A first transition happens at the so-called interpolation point, when solutions begin to exist (perfect fitting becomes possible). This transition reflects the properties of typical solutions, which however are in sharp minima and hard to sample. After a gap, a second transition occurs, with the discontinuous appearance of a different kind of "atypical" structures: wide regions of the weight space that are particularly solution dense and have good generalization properties. The two kinds of solutions coexist, with the typical ones being exponentially more numerous, but empirically we find that efficient algorithms sample the atypical, rare ones. This suggests that the atypical phase transition is the relevant one for learning. The results of numerical tests with realistic networks on observables suggested by the theory are consistent with this scenario.

3.
Phys Rev Lett ; 127(27): 278301, 2021 Dec 31.
Article in English | MEDLINE | ID: mdl-35061428

ABSTRACT

The success of deep learning has revealed the application potential of neural networks across the sciences and opened up fundamental theoretical problems. In particular, the fact that learning algorithms based on simple variants of gradient methods are able to find near-optimal minima of highly nonconvex loss functions is an unexpected feature of neural networks. Moreover, such algorithms are able to fit the data even in the presence of noise, and yet they have excellent predictive capabilities. Several empirical results have shown a reproducible correlation between the so-called flatness of the minima achieved by the algorithms and the generalization performance. At the same time, statistical physics results have shown that in nonconvex networks a multitude of narrow minima may coexist with a much smaller number of wide flat minima, which generalize well. Here, we show that wide flat minima arise as complex extensive structures, from the coalescence of minima around "high-margin" (i.e., locally robust) configurations. Despite being exponentially rare compared to zero-margin ones, high-margin minima tend to concentrate in particular regions. These minima are in turn surrounded by other solutions of smaller and smaller margin, leading to dense regions of solutions over long distances. Our analysis also provides an alternative analytical method for estimating when flat minima appear and when algorithms begin to find solutions, as the number of model parameters varies.

4.
Methods Mol Biol ; 2074: 57-65, 2020.
Article in English | MEDLINE | ID: mdl-31583630

ABSTRACT

Even if we know that two families of homologous proteins interact, we do not necessarily know, which specific proteins interact inside each species. The reason is that most families contain paralogs, i.e., more than one homologous sequence per species. We have developed a tool to predict interacting paralogs between the two protein families, which is based on the idea of inter-protein coevolution: our algorithm matches those members of the two protein families, which belong to the same species and collectively maximize the detectable coevolutionary signal. It is applicable even in cases, where simpler methods based, e.g., on genomic co-localization of genes coding for interacting proteins or orthology-based methods fail. In this method paper, we present an efficient implementation of this idea based on freely available software.


Subject(s)
Computational Biology/methods , Proteins/chemistry , Proteins/metabolism , Protein Binding , Software
5.
Proc Natl Acad Sci U S A ; 117(1): 161-170, 2020 01 07.
Article in English | MEDLINE | ID: mdl-31871189

ABSTRACT

Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points and such minimizers are often satisfactory at avoiding overfitting. How these 2 features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far-reaching open question. In this paper we study basic nonconvex 1- and 2-layer neural network models that learn random patterns and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy-driven greedy and message-passing algorithms that focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian, and their generalization performance on real data.

6.
Phys Rev Lett ; 123(17): 170602, 2019 Oct 25.
Article in English | MEDLINE | ID: mdl-31702271

ABSTRACT

Rectified linear units (ReLUs) have become the main model for the neural units in current deep learning systems. This choice was originally suggested as a way to compensate for the so-called vanishing gradient problem which can undercut stochastic gradient descent learning in networks composed of multiple layers. Here we provide analytical results on the effects of ReLUs on the capacity and on the geometrical landscape of the solution space in two-layer neural networks with either binary or real-valued weights. We study the problem of storing an extensive number of random patterns and find that, quite unexpectedly, the capacity of the network remains finite as the number of neurons in the hidden layer increases, at odds with the case of threshold units in which the capacity diverges. Possibly more important, a large deviation approach allows us to find that the geometrical landscape of the solution space has a peculiar structure: While the majority of solutions are close in distance but still isolated, there exist rare regions of solutions which are much more dense than the similar ones in the case of threshold units. These solutions are robust to perturbations of the weights and can tolerate large perturbations of the inputs. The analytical results are corroborated by numerical findings.

7.
Interface Focus ; 8(6): 20180033, 2018 Dec 06.
Article in English | MEDLINE | ID: mdl-30443331

ABSTRACT

Stochastic neural networks are a prototypical computational device able to build a probabilistic representation of an ensemble of external stimuli. Building on the relationship between inference and learning, we derive a synaptic plasticity rule that relies only on delayed activity correlations, and that shows a number of remarkable features. Our delayed-correlations matching (DCM) rule satisfies some basic requirements for biological feasibility: finite and noisy afferent signals, Dale's principle and asymmetry of synaptic connections, locality of the weight update computations. Nevertheless, the DCM rule is capable of storing a large, extensive number of patterns as attractors in a stochastic recurrent neural network, under general scenarios without requiring any modification: it can deal with correlated patterns, a broad range of architectures (with or without hidden neuronal states), one-shot learning with the palimpsest property, all the while avoiding the proliferation of spurious attractors. When hidden units are present, our learning rule can be employed to construct Boltzmann machine-like generative models, exploiting the addition of hidden neurons in feature extraction and classification tasks.

8.
Phys Rev Lett ; 120(26): 268103, 2018 Jun 29.
Article in English | MEDLINE | ID: mdl-30004730

ABSTRACT

Stochasticity and limited precision of synaptic weights in neural network models are key aspects of both biological and hardware modeling of learning processes. Here we show that a neural network model with stochastic binary weights naturally gives prominence to exponentially rare dense regions of solutions with a number of desirable properties such as robustness and good generalization performance, while typical solutions are isolated and hard to find. Binary solutions of the standard perceptron problem are obtained from a simple gradient descent procedure on a set of real values parametrizing a probability distribution over the binary synapses. Both analytical and numerical results are presented. An algorithmic extension that allows to train discrete deep neural networks is also investigated.

9.
Proc Natl Acad Sci U S A ; 115(7): 1457-1462, 2018 02 13.
Article in English | MEDLINE | ID: mdl-29382764

ABSTRACT

Quantum annealers aim at solving nonconvex optimization problems by exploiting cooperative tunneling effects to escape local minima. The underlying idea consists of designing a classical energy function whose ground states are the sought optimal solutions of the original optimization problem and add a controllable quantum transverse field to generate tunneling processes. A key challenge is to identify classes of nonconvex optimization problems for which quantum annealing remains efficient while thermal annealing fails. We show that this happens for a wide class of problems which are central to machine learning. Their energy landscapes are dominated by local minima that cause exponential slowdown of classical thermal annealers while simulated quantum annealing converges efficiently to rare dense regions of optimal solutions.

10.
Genome Biol ; 18(1): 37, 2017 02 20.
Article in English | MEDLINE | ID: mdl-28219439

ABSTRACT

BACKGROUND: Distinct RNA species may compete for binding to microRNAs (miRNAs). This competition creates an indirect interaction between miRNA targets, which behave as miRNA sponges and eventually influence each other's expression levels. Theoretical predictions suggest that not only the mean expression levels of targets but also the fluctuations around the means are coupled through miRNAs. This may result in striking effects on a broad range of cellular processes, such as cell differentiation and proliferation. Although several studies have reported the functional relevance of this mechanism of interaction, detailed experiments are lacking that study this phenomenon in controlled conditions by mimicking a physiological range. RESULTS: We used an experimental design based on two bidirectional plasmids and flow cytometry measurements of cotransfected mammalian cells. We validated a stochastic gene interaction model that describes how mRNAs can influence each other's fluctuations in a miRNA-dependent manner in single cells. We show that miRNA-target correlations eventually lead to either bimodal cell population distributions with high and low target expression states, or correlated fluctuations across targets when the pool of unbound targets and miRNAs are in near-equimolar concentration. We found that there is an optimal range of conditions for the onset of cross-regulation, which is compatible with 10-1000 copies of targets per cell. CONCLUSIONS: Our results are summarized in a phase diagram for miRNA-mediated cross-regulation that links experimentally measured quantities and effective model parameters. This phase diagram can be applied to in vivo studies of RNAs that are in competition for miRNA binding.


Subject(s)
Gene Expression Regulation , MicroRNAs/genetics , RNA Interference , RNA, Messenger/genetics , Single-Cell Analysis , Cell Survival/genetics , Gene Expression , Genes, Reporter , Plasmids/genetics , Transcription, Genetic
11.
Proc Natl Acad Sci U S A ; 113(48): E7655-E7662, 2016 11 29.
Article in English | MEDLINE | ID: mdl-27856745

ABSTRACT

In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here, we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare-but extremely dense and accessible-regions of configurations in the network weight space. We define a measure, the robust ensemble (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models and also provide a general algorithmic scheme that is straightforward to implement: define a cost function given by a sum of a finite number of replicas of the original cost function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.

12.
Proc Natl Acad Sci U S A ; 113(43): 12186-12191, 2016 10 25.
Article in English | MEDLINE | ID: mdl-27729520

ABSTRACT

Understanding protein-protein interactions is central to our understanding of almost all complex biological processes. Computational tools exploiting rapidly growing genomic databases to characterize protein-protein interactions are urgently needed. Such methods should connect multiple scales from evolutionary conserved interactions between families of homologous proteins, over the identification of specifically interacting proteins in the case of multiple paralogs inside a species, down to the prediction of residues being in physical contact across interaction interfaces. Statistical inference methods detecting residue-residue coevolution have recently triggered considerable progress in using sequence data for quaternary protein structure prediction; they require, however, large joint alignments of homologous protein pairs known to interact. The generation of such alignments is a complex computational task on its own; application of coevolutionary modeling has, in turn, been restricted to proteins without paralogs, or to bacterial systems with the corresponding coding genes being colocalized in operons. Here we show that the direct coupling analysis of residue coevolution can be extended to connect the different scales, and simultaneously to match interacting paralogs, to identify interprotein residue-residue contacts and to discriminate interacting from noninteracting families in a multiprotein system. Our results extend the potential applications of coevolutionary analysis far beyond cases treatable so far.


Subject(s)
Evolution, Molecular , Protein Binding/genetics , Protein Interaction Mapping , Proteins/genetics , Algorithms , Biophysical Phenomena , Computational Biology , Protein Conformation , Proteins/chemistry , Sequence Alignment , Sequence Homology, Amino Acid
13.
Phys Rev E ; 93(5): 052313, 2016 May.
Article in English | MEDLINE | ID: mdl-27300916

ABSTRACT

Learning in neural networks poses peculiar challenges when using discretized rather then continuous synaptic states. The choice of discrete synapses is motivated by biological reasoning and experiments, and possibly by hardware implementation considerations as well. In this paper we extend a previous large deviations analysis which unveiled the existence of peculiar dense regions in the space of synaptic states which accounts for the possibility of learning efficiently in networks with binary synapses. We extend the analysis to synapses with multiple states and generally more plausible biological features. The results clearly indicate that the overall qualitative picture is unchanged with respect to the binary case, and very robust to variation of the details of the model. We also provide quantitative results which suggest that the advantages of increasing the synaptic precision (i.e., the number of internal synaptic states) rapidly vanish after the first few bits, and therefore that, for practical applications, only few bits may be needed for near-optimal performance, consistent with recent biological findings. Finally, we demonstrate how the theoretical analysis can be exploited to design efficient algorithmic search strategies.


Subject(s)
Learning/physiology , Models, Neurological , Synapses/physiology , Neural Networks, Computer
14.
Phys Rev Lett ; 115(12): 128101, 2015 Sep 18.
Article in English | MEDLINE | ID: mdl-26431018

ABSTRACT

We show that discrete synaptic weights can be efficiently used for learning in large scale neural systems, and lead to unanticipated computational performance. We focus on the representative case of learning random patterns with binary synapses in single layer networks. The standard statistical analysis shows that this problem is exponentially dominated by isolated solutions that are extremely hard to find algorithmically. Here, we introduce a novel method that allows us to find analytical evidence for the existence of subdominant and extremely dense regions of solutions. Numerical experiments confirm these findings. We also show that the dense regions are surprisingly accessible by simple learning protocols, and that these synaptic configurations are robust to perturbations and generalize better than typical solutions. These outcomes extend to synapses with multiple states and to deeper neural architectures. The large deviation measure also suggests how to design novel algorithmic schemes for optimization based on local entropy maximization.

15.
PLoS Comput Biol ; 11(8): e1004439, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26291608

ABSTRACT

Understanding the theoretical foundations of how memories are encoded and retrieved in neural populations is a central challenge in neuroscience. A popular theoretical scenario for modeling memory function is the attractor neural network scenario, whose prototype is the Hopfield model. The model simplicity and the locality of the synaptic update rules come at the cost of a poor storage capacity, compared with the capacity achieved with perceptron learning algorithms. Here, by transforming the perceptron learning rule, we present an online learning rule for a recurrent neural network that achieves near-maximal storage capacity without an explicit supervisory error signal, relying only upon locally accessible information. The fully-connected network consists of excitatory binary neurons with plastic recurrent connections and non-plastic inhibitory feedback stabilizing the network dynamics; the memory patterns to be memorized are presented online as strong afferent currents, producing a bimodal distribution for the neuron synaptic inputs. Synapses corresponding to active inputs are modified as a function of the value of the local fields with respect to three thresholds. Above the highest threshold, and below the lowest threshold, no plasticity occurs. In between these two thresholds, potentiation/depression occurs when the local field is above/below an intermediate threshold. We simulated and analyzed a network of binary neurons implementing this rule and measured its storage capacity for different sizes of the basins of attraction. The storage capacity obtained through numerical simulations is shown to be close to the value predicted by analytical calculations. We also measured the dependence of capacity on the strength of external inputs. Finally, we quantified the statistics of the resulting synaptic connectivity matrix, and found that both the fraction of zero weight synapses and the degree of symmetry of the weight matrix increase with the number of stored patterns.


Subject(s)
Memory/physiology , Models, Neurological , Nerve Net/physiology , Neurons/physiology , Algorithms , Computational Biology , Computer Simulation , Neural Networks, Computer
16.
PLoS One ; 9(3): e92721, 2014.
Article in English | MEDLINE | ID: mdl-24663061

ABSTRACT

In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code.


Subject(s)
Models, Molecular , Proteins/chemistry , Proteins/metabolism , Bacteria/cytology , Multivariate Analysis , Normal Distribution , Protein Binding , Protein Conformation , Protein Structure, Tertiary , Sequence Alignment , Signal Transduction , Time Factors
17.
Pac Symp Biocomput ; : 39-50, 2014.
Article in English | MEDLINE | ID: mdl-24297532

ABSTRACT

Advances in experimental techniques resulted in abundant genomic, transcriptomic, epigenomic, and proteomic data that have the potential to reveal critical drivers of human diseases. Complementary algorithmic developments enable researchers to map these data onto protein-protein interaction networks and infer which signaling pathways are perturbed by a disease. Despite this progress, integrating data across different biological samples or patients remains a substantial challenge because samples from the same disease can be extremely heterogeneous. Somatic mutations in cancer are an infamous example of this heterogeneity. Although the same signaling pathways may be disrupted in a cancer patient cohort, the distribution of mutations is long-tailed, and many driver mutations may only be detected in a small fraction of patients. We developed a computational approach to account for heterogeneous data when inferring signaling pathways by sharing information across the samples. Our technique builds upon the prize-collecting Steiner forest problem, a network optimization algorithm that extracts pathways from a protein-protein interaction network. We recover signaling pathways that are similar across all samples yet still reflect the unique characteristics of each biological sample. Leveraging data from related tumors improves our ability to recover the disrupted pathways and reveals patient-specific pathway perturbations in breast cancer.


Subject(s)
Algorithms , Neoplasms/genetics , Neoplasms/metabolism , Protein Interaction Maps , Signal Transduction , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Computational Biology , Databases, Genetic , ErbB Receptors/genetics , ErbB Receptors/metabolism , Female , Humans , Models, Biological , Mutation
18.
PLoS Comput Biol ; 9(8): e1003167, 2013.
Article in English | MEDLINE | ID: mdl-23950700

ABSTRACT

The anterior inferotemporal cortex (IT) is the highest stage along the hierarchy of visual areas that, in primates, processes visual objects. Although several lines of evidence suggest that IT primarily represents visual shape information, some recent studies have argued that neuronal ensembles in IT code the semantic membership of visual objects (i.e., represent conceptual classes such as animate and inanimate objects). In this study, we investigated to what extent semantic, rather than purely visual information, is represented in IT by performing a multivariate analysis of IT responses to a set of visual objects. By relying on a variety of machine-learning approaches (including a cutting-edge clustering algorithm that has been recently developed in the domain of statistical physics), we found that, in most instances, IT representation of visual objects is accounted for by their similarity at the level of shape or, more surprisingly, low-level visual properties. Only in a few cases we observed IT representations of semantic classes that were not explainable by the visual similarity of their members. Overall, these findings reassert the primary function of IT as a conveyor of explicit visual shape information, and reveal that low-level visual properties are represented in IT to a greater extent than previously appreciated. In addition, our work demonstrates how combining a variety of state-of-the-art multivariate approaches, and carefully estimating the contribution of shape similarity to the representation of object categories, can substantially advance our understanding of neuronal coding of visual objects in cortex.


Subject(s)
Models, Neurological , Neurons/physiology , Temporal Lobe/physiology , Vision, Ocular/physiology , Algorithms , Animals , Cluster Analysis , Computational Biology , Discriminant Analysis , Haplorhini , Multivariate Analysis , Neurons/cytology , Semantics , Temporal Lobe/cytology
19.
Proc Natl Acad Sci U S A ; 104(26): 11079-84, 2007 Jun 26.
Article in English | MEDLINE | ID: mdl-17581884

ABSTRACT

Recent experimental studies indicate that synaptic changes induced by neuronal activity are discrete jumps between a small number of stable states. Learning in systems with discrete synapses is known to be a computationally hard problem. Here, we study a neurobiologically plausible on-line learning algorithm that derives from belief propagation algorithms. We show that it performs remarkably well in a model neuron with binary synapses, and a finite number of "hidden" states per synapse, that has to learn a random classification task. Such a system is able to learn a number of associations close to the theoretical limit in time that is sublinear in system size. This is to our knowledge the first on-line algorithm that is able to achieve efficiently a finite number of patterns learned per binary synapse. Furthermore, we show that performance is optimal for a finite number of hidden states that becomes very small for sparse coding. The algorithm is similar to the standard "perceptron" learning algorithm, with an additional rule for synaptic transitions that occur only if a currently presented pattern is "barely correct." In this case, the synaptic changes are metaplastic only (change in hidden states and not in actual synaptic state), stabilizing the synapse in its current state. Finally, we show that a system with two visible states and K hidden states is much more robust to noise than a system with K visible states. We suggest that this rule is sufficiently simple to be easily implemented by neurobiological systems or in hardware.


Subject(s)
Algorithms , Learning , Neural Networks, Computer , Synapses , Models, Neurological , Nerve Net
SELECTION OF CITATIONS
SEARCH DETAIL
...