Your browser doesn't support javascript.
loading
Orthogonal Gated Recurrent Unit With Neumann-Cayley Transformation.
Zadorozhnyy, Vasily; Mucllari, Edison; Pospisil, Cole; Nguyen, Duc; Ye, Qiang.
Afiliação
  • Zadorozhnyy V; Department of Mathematics, University of Kentucky, Lexington, KY 40506, U.S.A. vasily.zadorozhnyy@uky.edu.
  • Mucllari E; Department of Mathematics, University of Kentucky, Lexington, KY 40506, U.S.A. edison.mucllari@uky.edu.
  • Pospisil C; Department of Mathematics, University of Kentucky, Lexington, KY 40506, U.S.A. Cole.Pospisil@uky.edu.
  • Nguyen D; Department of Mathematics, University of Kentucky, Lexington, KY 40506, U.S.A. ducnguyen@uky.edu.
  • Ye Q; Department of Mathematics, University of Kentucky, Lexington, KY 40506, U.S.A. qye3@uky.edu.
Neural Comput ; : 1-26, 2024 Sep 23.
Article em En | MEDLINE | ID: mdl-39312497
ABSTRACT
In recent years, using orthogonal matrices has been shown to be a promising approach to improving recurrent neural networks (RNNs) with training, stability, and convergence, particularly to control gradients. While gated recurrent unit (GRU) and long short-term memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the use of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and propose a Neumann series-based scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley orthogonal GRU (NC-GRU). We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU and several other RNNs.

Texto completo: 1 Bases de dados: MEDLINE Idioma: En Revista: Neural Comput Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Bases de dados: MEDLINE Idioma: En Revista: Neural Comput Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos