Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 89
Filtrar
1.
ArXiv ; 2024 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-38764587

RESUMO

The integration of neural representations in the two hemispheres is an important problem in neuroscience. Recent experiments revealed that odor responses in cortical neurons driven by separate stimulation of the two nostrils are highly correlated. This bilateral alignment points to structured inter-hemispheric connections, but detailed mechanism remains unclear. Here, we hypothesized that continuous exposure to environmental odors shapes these projections and modeled it as online learning with local Hebbian rule. We found that Hebbian learning with sparse connections achieves bilateral alignment, exhibiting a linear trade-off between speed and accuracy. We identified an inverse scaling relationship between the number of cortical neurons and the inter-hemispheric projection density required for desired alignment accuracy, i.e., more cortical neurons allow sparser inter-hemispheric projections. We next compared the alignment performance of local Hebbian rule and the global stochastic-gradient-descent (SGD) learning for artificial neural networks. We found that although SGD leads to the same alignment accuracy with modestly sparser connectivity, the same inverse scaling relation holds. We showed that their similar performance originates from the fact that the update vectors of the two learning rules align significantly throughout the learning process. This insight may inspire efficient sparse local learning algorithms for more complex problems.

2.
Proc Natl Acad Sci U S A ; 121(21): e2401567121, 2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38748573

RESUMO

Nearly all circadian clocks maintain a period that is insensitive to temperature changes, a phenomenon known as temperature compensation (TC). Yet, it is unclear whether there is any common feature among different systems that exhibit TC. From a general timescale invariance, we show that TC relies on the existence of certain period-lengthening reactions wherein the period of the system increases strongly with the rates in these reactions. By studying several generic oscillator models, we show that this counterintuitive dependence is nonetheless a common feature of oscillators in the nonlinear (far-from-onset) regime where the oscillation can be separated into fast and slow phases. The increase of the period with the period-lengthening reaction rates occurs when the amplitude of the slow phase in the oscillation increases with these rates while the progression speed in the slow phase is controlled by other rates of the system. The positive dependence of the period on the period-lengthening rates balances its inverse dependence on other kinetic rates in the system, which gives rise to robust TC in a wide range of parameters. We demonstrate the existence of such period-lengthening reactions and their relevance for TC in all four model systems we considered. Theoretical results for a model of the Kai system are supported by experimental data. A study of the energy dissipation also shows that better TC performance requires higher energy consumption. Our study unveils a general mechanism by which a biochemical oscillator achieves TC by operating in parameter regimes far from the onset where period-lengthening reactions exist.

3.
Proc Natl Acad Sci U S A ; 120(42): e2303115120, 2023 10 17.
Artigo em Inglês | MEDLINE | ID: mdl-37824527

RESUMO

The Escherichia coli chemotaxis signaling pathway has served as a model system for the adaptive sensing of environmental signals by large protein complexes. The chemoreceptors control the kinase activity of CheA in response to the extracellular ligand concentration and adapt across a wide concentration range by undergoing methylation and demethylation. Methylation shifts the kinase response curve by orders of magnitude in ligand concentration while incurring a much smaller change in the ligand binding curve. Here, we show that the disproportionate shift in binding and kinase response is inconsistent with equilibrium allosteric models. To resolve this inconsistency, we present a nonequilibrium allosteric model that explicitly includes the dissipative reaction cycles driven by adenosine triphosphate (ATP) hydrolysis. The model successfully explains all existing joint measurements of ligand binding, receptor conformation, and kinase activity for both aspartate and serine receptors. Our results suggest that the receptor complex acts as an enzyme: Receptor methylation modulates the ON-state kinetics of the kinase (e.g., phosphorylation rate), while ligand binding controls the equilibrium balance between kinase ON/OFF states. Furthermore, sufficient energy dissipation is responsible for maintaining and enhancing the sensitivity range and amplitude of the kinase response. We demonstrate that the nonequilibrium allosteric model is broadly applicable to other sensor-kinase systems by successfully fitting previously unexplained data from the DosP bacterial oxygen-sensing system. Overall, this work provides a nonequilibrium physics perspective on cooperative sensing by large protein complexes and opens up research directions for understanding their microscopic mechanisms through simultaneous measurements and modeling of ligand binding and downstream responses.


Assuntos
Quimiotaxia , Proteínas de Escherichia coli , Quimiotaxia/fisiologia , Proteínas Quimiotáticas Aceptoras de Metil/metabolismo , Proteínas de Escherichia coli/metabolismo , Ligantes , Histidina Quinase/metabolismo , Escherichia coli/metabolismo , Transdução de Sinais/fisiologia , Proteínas de Bactérias/metabolismo
4.
Nat Commun ; 14(1): 5907, 2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37737245

RESUMO

Biological processes are typically actuated by dynamic multi-subunit molecular complexes. However, interactions between subunits, which govern the functions of these complexes, are hard to measure directly. Here, we develop a general approach combining cryo-EM imaging technology and statistical modeling and apply it to study the hexameric clock protein KaiC in Cyanobacteria. By clustering millions of KaiC monomer images, we identify two major conformational states of KaiC monomers. We then classify the conformational states of (>160,000) KaiC hexamers by the thirteen distinct spatial arrangements of these two subunit states in the hexamer ring. We find that distributions of the thirteen hexamer conformational patterns for two KaiC phosphorylation mutants can be fitted quantitatively by an Ising model, which reveals a significant cooperativity between neighboring subunits with phosphorylation shifting the probability of subunit conformation. Our results show that a KaiC hexamer can respond in a switch-like manner to changes in its phosphorylation level.


Assuntos
Relógios Circadianos , Microscopia Crioeletrônica , Proteínas CLOCK , Análise por Conglomerados , Modelos Estatísticos
5.
J R Soc Interface ; 20(204): 20230276, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37403484

RESUMO

Accurate and robust spatial orders are ubiquitous in living systems. In 1952, Turing proposed a general mechanism for pattern formation exemplified by a reaction-diffusion model with two chemical species in a large system. However, in small biological systems such as a cell, the existence of multiple Turing patterns and strong noise can lower the spatial order. Recently, a modified reaction-diffusion model with an additional chemical species is shown to stabilize the Turing pattern. Here, we study non-equilibrium thermodynamics of this three-species reaction-diffusion model to understand the relationship between energy cost and the performance of self-positioning. By using computational and analytical approaches, we show that beyond the onset of pattern formation the positioning error decreases as energy dissipation increases. In a finite system, we find that a specific Turing pattern exists only within a finite range of total molecule number. Energy dissipation broadens this range, which enhances the robustness of Turing pattern against molecule number fluctuations in living cells. The generality of these results is verified in a realistic model of the Muk system underlying DNA segregation in Escherichia coli, and testable predictions are made for the dependence of the accuracy and robustness of the spatial pattern on the ATP/ADP ratio.


Assuntos
Modelos Biológicos , Difusão , Termodinâmica
6.
Phys Rev Lett ; 130(23): 237101, 2023 Jun 09.
Artigo em Inglês | MEDLINE | ID: mdl-37354404

RESUMO

Generalization is one of the most important problems in deep learning, where there exist many low-loss solutions due to overparametrization. Previous empirical studies showed a strong correlation between flatness of the loss landscape at a solution and its generalizability, and stochastic gradient descent (SGD) is crucial in finding the flat solutions. To understand the effects of SGD, we construct a simple model whose overall loss landscape has a continuous set of degenerate (or near-degenerate) minima and the loss landscape for a minibatch is approximated by a random shift of the overall loss function. By direct simulations of the stochastic learning dynamics and solving the underlying Fokker-Planck equation, we show that due to its strong anisotropy the SGD noise introduces an additional effective loss term that decreases with flatness and has an overall strength that increases with the learning rate and batch-to-batch variation. We find that the additional landscape-dependent SGD loss breaks the degeneracy and serves as an effective regularization for finding flat solutions. As a result, the flatness of the overall loss landscape increases during learning and reaches a higher value (flatter minimum) for a larger SGD noise strength before the noise strength reaches a critical value when the system fails to converge. These results, which are verified in realistic neural network models, elucidate the role of SGD for generalization, and they may also have important implications for hyperparameter selection for learning efficiently without divergence.


Assuntos
Algoritmos , Redes Neurais de Computação , Processos Estocásticos
7.
ArXiv ; 2023 Feb 23.
Artigo em Inglês | MEDLINE | ID: mdl-36866223

RESUMO

The Escherichia coli chemotaxis signaling pathway has served as a model system for studying the adaptive sensing of environmental signals by large protein complexes. The chemoreceptors control the kinase activity of CheA in response to the extracellular ligand concentration and adapt across a wide concentration range by undergoing methylation and demethylation. Methylation shifts the kinase response curve by orders of magnitude in ligand concentration while incurring a much smaller change in the ligand binding curve. Here, we show that this asymmetric shift in binding and kinase response is inconsistent with equilibrium allosteric models regardless of parameter choices. To resolve this inconsistency, we present a nonequilibrium allosteric model that explicitly includes the dissipative reaction cycles driven by ATP hydrolysis. The model successfully explains all existing measurements for both aspartate and serine receptors. Our results suggest that while ligand binding controls the equilibrium balance between the ON and OFF states of the kinase, receptor methylation modulates the kinetic properties (e.g., the phosphorylation rate) of the ON state. Furthermore, sufficient energy dissipation is necessary for maintaining and enhancing the sensitivity range and amplitude of the kinase response. We demonstrate that the nonequilibrium allosteric model is broadly applicable to other sensor-kinase systems by successfully fitting previously unexplained data from the DosP bacterial oxygen-sensing system. Overall, this work provides a new perspective on cooperative sensing by large protein complexes and opens up new research directions for understanding their microscopic mechanisms through simultaneous measurements and modeling of ligand binding and downstream responses.

8.
ArXiv ; 2023 Dec 29.
Artigo em Inglês | MEDLINE | ID: mdl-38235063

RESUMO

The Escherichia coli chemoreceptors form an extensive array that achieves cooperative and adaptive sensing of extracellular signals. The receptors control the activity of histidine kinase CheA, which drives a non-equilibrium phosphorylation-dephosphorylation reaction cycle for response regulator CheY. Recent single-cell FRET measurements revealed that kinase activity of the array spontaneously switches between active and inactive states, with asymmetric switching times that signify time-reversal symmetry breaking in the underlying dynamics. Here, we show that the asymmetric switching dynamics can be explained by a non-equilibrium lattice model, which considers both the dissipative reaction cycles of individual core units and the coupling between neighboring units. The model reveals that large dissipation and near-critical coupling are required to explain the observed switching dynamics. Microscopically, the switching time asymmetry originates from irreversible transition paths. The model shows that strong dissipation enables sensitive and rapid signaling response by relieving the speed-sensitivity trade-off, which can be tested by future single-cell experiments. Overall, our model provides a general framework for studying biological complexes composed of coupled subunits that are individually driven by dissipative cycles and the rich non-equilibrium physics within.

9.
Nat Commun ; 13(1): 5327, 2022 09 10.
Artigo em Inglês | MEDLINE | ID: mdl-36088344

RESUMO

Adaptation is a defining feature of living systems. The bacterial flagellar motor adapts to changes in the external mechanical load by adding or removing torque-generating (stator) units. But the molecular mechanism behind this mechano-adaptation remains unclear. Here, we combine single motor eletrorotation experiments and theoretical modeling to show that mechano-adaptation of the flagellar motor is enabled by multiple mechanosensitive internal states. Dwell time statistics from experiments suggest the existence of at least two bound states with a high and a low unbinding rate, respectively. A first-passage-time analysis of a four-state model quantitatively explains the experimental data and determines the transition rates among all four states. The torque generated by bound stator units controls their effective unbinding rate by modulating the transition between the bound states, possibly via a catch bond mechanism. Similar force-mediated feedback enabled by multiple internal states may apply to adaptation in other macromolecular complexes.


Assuntos
Flagelos , Proteínas Motores Moleculares , Aclimatação , Bactérias/metabolismo , Flagelos/metabolismo , Proteínas Motores Moleculares/metabolismo , Torque
10.
Front Microbiol ; 13: 866141, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35694287

RESUMO

In this article, we develop a mathematical model for the rotary bacterial flagellar motor (BFM) based on the recently discovered structure of the stator complex (MotA5MotB2). The structure suggested that the stator also rotates. The BFM is modeled as two rotating nano-rings that interact with each other. Specifically, translocation of protons through the stator complex drives rotation of the MotA pentamer ring, which in turn drives rotation of the FliG ring in the rotor via interactions between the MotA ring of the stator and the FliG ring of the rotor. Preliminary results from the structure-informed model are consistent with the observed torque-speed relation. More importantly, the model predicts distinctive rotor and stator dynamics and their load dependence, which may be tested by future experiments. Possible approaches to verify and improve the model to further understand the molecular mechanism for torque generation in BFM are also discussed.

11.
Phys Rev X ; 12(1)2022.
Artigo em Inglês | MEDLINE | ID: mdl-35756903

RESUMO

Protein concentration in a living cell fluctuates over time due to noise in growth and division processes. In the high expression regime, variance of the protein concentration in a cell was found to scale with the square of the mean, which belongs to a general phenomenon called Taylor's law (TL). To understand the origin for these fluctuations, we measured protein concentration dynamics in single E. coli cells from a set of strains with a variable expression of fluorescent proteins. The protein expression is controlled by a set of constitutive promoters with different strength, which allows to change the expression level over 2 orders of magnitude without introducing noise from fluctuations in transcription regulators. Our data confirms the square TL, but the prefactor A has a cell-to-cell variation independent of the promoter strength. Distributions of the normalized protein concentration for different promoters are found to collapse onto the same curve. To explain these observations, we used a minimal mechanistic model to describe the stochastic growth and division processes in a single cell with a feedback mechanism for regulating cell division. In the high expression regime where extrinsic noise dominates, the model reproduces our experimental results quantitatively. By using a mean-field approximation in the minimal model, we showed that the stochastic dynamics of protein concentration is described by a Langevin equation with multiplicative noise. The Langevin equation has a scale invariance which is responsible for the square TL. By solving the Langevin equation, we obtained an analytical solution for the protein concentration distribution function that agrees with experiments. The solution shows explicitly how the prefactor A depends on strength of different noise sources, which explains its cell-to-cell variability. By using this approach to analyze our single-cell data, we found that the noise in production rate dominates the noise from cell division. The deviation from the square TL in the low expression regime can also be captured in our model by including intrinsic noise in the production rate.

12.
Sci Adv ; 8(26): eabn0080, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35767611

RESUMO

The highly ramified arbors of neuronal dendrites provide the substrate for the high connectivity and computational power of the brain. Altered dendritic morphology is associated with neuronal diseases. Many molecules have been shown to play crucial roles in shaping and maintaining dendrite morphology. However, the underlying principles by which molecular interactions generate branched morphologies are not understood. To elucidate these principles, we visualized the growth of dendrites throughout larval development of Drosophila sensory neurons and found that the tips of dendrites undergo dynamic instability, transitioning rapidly and stochastically between growing, shrinking, and paused states. By incorporating these measured dynamics into an agent-based computational model, we showed that the complex and highly variable dendritic morphologies of these cells are a consequence of the stochastic dynamics of their dendrite tips. These principles may generalize to branching of other neuronal cell types, as well as to branching at the subcellular and tissue levels.

13.
Phys Rev E ; 105(4-1): 044140, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35590650

RESUMO

Nonequilibrium reaction networks (NRNs) underlie most biological functions. Despite their diverse dynamic properties, NRNs share the signature characteristics of persistent probability fluxes and continuous energy dissipation even in the steady state. Dynamics of NRNs can be described at different coarse-grained levels. Our previous work showed that the apparent energy dissipation rate at a coarse-grained level follows an inverse power-law dependence on the scale of coarse-graining. The scaling exponent is determined by the network structure and correlation of stationary probability fluxes. However, it remains unclear whether and how the (renormalized) flux correlation varies with coarse-graining. Following Kadanoff's real space renormalization group (RG) approach for critical phenomena, we address this question by developing a state-space renormalization group theory for NRNs, which leads to an iterative RG equation for the flux correlation function. In square and hypercubic lattices, we solve the RG equation exactly and find two types of fixed point solutions. There is a family of nontrivial fixed points where the correlation exhibits power-law decay, characterized by a power exponent that can take any value within a continuous range. There is also a trivial fixed point where the correlation vanishes beyond the nearest neighbors. The power-law fixed point is stable if and only if the power exponent is less than the lattice dimension n. Consequently, the correlation function converges to the power-law fixed point only when the correlation in the fine-grained network decays slower than r^{-n} and to the trivial fixed point otherwise. If the flux correlation in the fine-grained network contains multiple stable solutions with different exponents, the RG iteration dynamics select the fixed point solution with the smallest exponent. The analytical results are supported by numerical simulations. We also discuss a possible connection between the RG flows of flux correlation with those of the Kosterlitz-Thouless transition.

14.
Rev Neurosci ; 33(2): 111-132, 2022 02 23.
Artigo em Inglês | MEDLINE | ID: mdl-34271607

RESUMO

The piriform cortex is rich in recurrent excitatory synaptic connections between pyramidal neurons. We asked how such connections could shape cortical responses to olfactory lateral olfactory tract (LOT) inputs. For this, we constructed a computational network model of anterior piriform cortex with 2000 multicompartment, multiconductance neurons (500 semilunar, 1000 layer 2 and 500 layer 3 pyramids; 200 superficial interneurons of two types; 500 deep interneurons of three types; 500 LOT afferents), incorporating published and unpublished data. With a given distribution of LOT firing patterns, and increasing the strength of recurrent excitation, a small number of firing patterns were observed in pyramidal cell networks: first, sparse firings; then temporally and spatially concentrated epochs of action potentials, wherein each neuron fires one or two spikes; then more synchronized events, associated with bursts of action potentials in some pyramidal neurons. We suggest that one function of anterior piriform cortex is to transform ongoing streams of input spikes into temporally focused spike patterns, called here "cell assemblies", that are salient for downstream projection areas.


Assuntos
Córtex Piriforme , Potenciais de Ação/fisiologia , Humanos , Neurônios/fisiologia , Bulbo Olfatório/fisiologia , Córtex Piriforme/fisiologia , Células Piramidais/fisiologia
15.
Phys Rev Lett ; 129(27): 278001, 2022 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-36638284

RESUMO

We study the energy cost of flocking in the active Ising model (AIM) and show that, besides the energy cost for self-propelled motion, an additional energy dissipation is required to power the alignment of spins. We find that this additional alignment dissipation reaches its maximum at the flocking transition point in the form of a cusp with a discontinuous first derivative with respect to the control parameter. To understand this singular behavior, we analytically solve the two- and three-site AIM models and obtain the exact dependence of the alignment dissipation on the flocking order parameter and control parameter, which explains the cusped dissipation maximum at the flocking transition. Our results reveal a trade-off between the energy cost of the system and its performance measured by the flocking speed and sensitivity to external perturbations. This trade-off relationship provides a new perspective for understanding the dynamics of natural flocks and designing optimal artificial flocking systems.

16.
Front Comput Neurosci ; 15: 730431, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34744674

RESUMO

In Drosophila, olfactory information received by olfactory receptor neurons (ORNs) is first processed by an incoherent feed forward neural circuit in the antennal lobe (AL) that consists of ORNs (input), inhibitory local neurons (LNs), and projection neurons (PNs). This "early" olfactory information processing has two important characteristics. First, response of a PN to its cognate ORN is normalized by the overall activity of other ORNs, a phenomenon termed "divisive normalization." Second, PNs respond strongly to the onset of ORN activities, but they adapt to prolonged or continuously varying inputs. Despite the importance of these characteristics for learning and memory, their underlying mechanisms are not fully understood. Here, we develop a circuit model for describing the ORN-LN-PN dynamics by including key neuron-neuron interactions such as short-term plasticity (STP) and presynaptic inhibition (PI). By fitting our model to experimental data quantitatively, we show that a strong STP balanced between short-term facilitation (STF) and short-term depression (STD) is responsible for the observed nonlinear divisive normalization in Drosophila. Our circuit model suggests that either STP or PI alone can lead to adaptive response. However, by comparing our model results with experimental data, we find that both STP and PI work together to achieve a strong and robust adaptive response. Our model not only helps reveal the mechanisms underlying two main characteristics of the early olfactory process, it can also be used to predict PN responses to arbitrary time-dependent signals and to infer microscopic properties of the circuit (such as the strengths of STF and STD) from the measured input-output relation. Our circuit model may be useful for understanding the role of STP in other sensory systems.

17.
Nat Commun ; 12(1): 3125, 2021 05 25.
Artigo em Inglês | MEDLINE | ID: mdl-34035278

RESUMO

Searching for possible biochemical networks that perform a certain function is a challenge in systems biology. For simple functions and small networks, this can be achieved through an exhaustive search of the network topology space. However, it is difficult to scale this approach up to larger networks and more complex functions. Here we tackle this problem by training a recurrent neural network (RNN) to perform the desired function. By developing a systematic perturbative method to interrogate the successfully trained RNNs, we are able to distill the underlying regulatory network among the biological elements (genes, proteins, etc.). Furthermore, we show several cases where the regulation networks found by RNN can achieve the desired biological function when its edges are expressed by more realistic response functions, such as the Hill-function. This method can be used to link topology and function by helping uncover the regulation logic and network topology for complex tasks.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Modelos Genéticos , Redes Neurais de Computação , Animais , Biologia Computacional/métodos , Simulação por Computador , Regulação da Expressão Gênica , Humanos , Reprodutibilidade dos Testes , Software
18.
Proc Natl Acad Sci U S A ; 118(15)2021 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-33876769

RESUMO

Motility is important for the survival and dispersal of many bacteria, and it often plays a role during infections. Regulation of bacterial motility by chemical stimuli is well studied, but recent work has added a new dimension to the problem of motility control. The bidirectional flagellar motor of the bacterium Escherichia coli recruits or releases torque-generating units (stator units) in response to changes in load. Here, we show that this mechanosensitive remodeling of the flagellar motor is independent of direction of rotation. Remodeling rate constants in clockwise rotating motors and in counterclockwise rotating motors, measured previously, fall on the same curve if plotted against torque. Increased torque decreases the off rate of stator units from the motor, thereby increasing the number of active stator units at steady state. A simple mathematical model based on observed dynamics provides quantitative insight into the underlying molecular interactions. The torque-dependent remodeling mechanism represents a robust strategy to quickly regulate output (torque) in response to changes in demand (load).


Assuntos
Flagelos/química , Mecanotransdução Celular , Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Escherichia coli , Flagelos/metabolismo , Modelos Teóricos , Rotação
19.
Phys Rev Lett ; 126(8): 080601, 2021 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-33709722

RESUMO

The energy dissipation rate in a nonequilibrium reaction system can be determined by the reaction rates in the underlying reaction network. By developing a coarse-graining process in state space and a corresponding renormalization procedure for reaction rates, we find that energy dissipation rate has an inverse power-law dependence on the number of microscopic states in a coarse-grained state. The dissipation scaling law requires self-similarity of the underlying network, and the scaling exponent depends on the network structure and the probability flux correlation. Existence of the inverse dissipation scaling law is shown in realistic biochemical systems such as biochemical oscillators and microtubule-kinesin active flow systems.


Assuntos
Modelos Teóricos , Metabolismo Energético , Entropia , Cinesinas/química , Cinesinas/metabolismo , Cinética , Microtúbulos/química , Microtúbulos/metabolismo
20.
Proc Natl Acad Sci U S A ; 118(9)2021 03 02.
Artigo em Inglês | MEDLINE | ID: mdl-33619091

RESUMO

Despite tremendous success of the stochastic gradient descent (SGD) algorithm in deep learning, little is known about how SGD finds generalizable solutions at flat minima of the loss function in high-dimensional weight space. Here, we investigate the connection between SGD learning dynamics and the loss function landscape. A principal component analysis (PCA) shows that SGD dynamics follow a low-dimensional drift-diffusion motion in the weight space. Around a solution found by SGD, the loss function landscape can be characterized by its flatness in each PCA direction. Remarkably, our study reveals a robust inverse relation between the weight variance and the landscape flatness in all PCA directions, which is the opposite to the fluctuation-response relation (aka Einstein relation) in equilibrium statistical physics. To understand the inverse variance-flatness relation, we develop a phenomenological theory of SGD based on statistical properties of the ensemble of minibatch loss functions. We find that both the anisotropic SGD noise strength (temperature) and its correlation time depend inversely on the landscape flatness in each PCA direction. Our results suggest that SGD serves as a landscape-dependent annealing algorithm. The effective temperature decreases with the landscape flatness so the system seeks out (prefers) flat minima over sharp ones. Based on these insights, an algorithm with landscape-dependent constraints is developed to mitigate catastrophic forgetting efficiently when learning multiple tasks sequentially. In general, our work provides a theoretical framework to understand learning dynamics, which may eventually lead to better algorithms for different learning tasks.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...