RESUMO
We propose a class of novel fractional-order optimization algorithms. We define a fractional-order gradient via the Caputo fractional derivatives that generalizes integer-order gradient. We refer it to as the Caputo fractional-based gradient, and develop an efficient implementation to compute it. A general class of fractional-order optimization methods is then obtained by replacing integer-order gradients with the Caputo fractional-based gradients. To give concrete algorithms, we consider gradient descent (GD) and Adam, and extend them to the Caputo fractional GD (CfGD) and the Caputo fractional Adam (CfAdam). We demonstrate the superiority of CfGD and CfAdam on several large scale optimization problems that arise from scientific machine learning applications, such as ill-conditioned least squares problem on real-world data and the training of neural networks involving non-convex objective functions. Numerical examples show that both CfGD and CfAdam result in acceleration over GD and Adam, respectively. We also derive error bounds of CfGD for quadratic functions, which further indicate that CfGD could mitigate the dependence on the condition number in the rate of convergence and results in significant acceleration over GD.
Assuntos
Algoritmos , Redes Neurais de Computação , Aprendizado de MáquinaRESUMO
We present the analysis of approximation rates of operator learning in Chen and Chen (1995) and Lu et al. (2021), where continuous operators are approximated by a sum of products of branch and trunk networks. In this work, we consider the rates of learning solution operators from both linear and nonlinear advection-diffusion equations with or without reaction. We find that the approximation rates depend on the architecture of branch networks as well as the smoothness of inputs and outputs of solution operators.
Assuntos
AlgoritmosRESUMO
We propose the GENERIC formalism informed neural networks (GFINNs) that obey the symmetric degeneracy conditions of the GENERIC formalism. GFINNs comprise two modules, each of which contains two components. We model each component using a neural network whose architecture is designed to satisfy the required conditions. The component-wise architecture design provides flexible ways of leveraging available physics information into neural networks. We prove theoretically that GFINNs are sufficiently expressive to learn the underlying equations, hence establishing the universal approximation theorem. We demonstrate the performance of GFINNs in three simulation problems: gas containers exchanging heat and volume, thermoelastic double pendulum and the Langevin dynamics. In all the examples, GFINNs outperform existing methods, hence demonstrating good accuracy in predictions for both deterministic and stochastic systems. This article is part of the theme issue 'Data-driven prediction in dynamical systems'.