STACoRe: Spatio-temporal and action-based contrastive representations for reinforcement learning in Atari.

Lee, Young Jae; Kim, Jaehoon; Kwak, Mingu; Park, Young Joon; Kim, Seoung Bum

Lee, Young Jae; Kim, Jaehoon; Kwak, Mingu; Park, Young Joon; Kim, Seoung Bum.

Afiliação

Lee YJ; School of Industrial and Management Engineering, Korea University, Seoul, Republic of Korea.
Kim J; School of Industrial and Management Engineering, Korea University, Seoul, Republic of Korea.
Kwak M; School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA.
Park YJ; LG AI Research, Seoul, Republic of Korea. Electronic address: yj.park@lgresearch.ai.
Kim SB; School of Industrial and Management Engineering, Korea University, Seoul, Republic of Korea. Electronic address: sbkim1@korea.ac.kr.

Neural Netw ; 160: 1-11, 2023 Mar.

Article em En | MEDLINE | ID: mdl-36587439

ABSTRACT

ABSTRACT

With the development of deep learning technology, deep reinforcement learning (DRL) has successfully built intelligent agents in sequential decision-making problems through interaction with image-based environments. However, learning from unlimited interaction is impractical and sample inefficient because training an agent requires many trial and error and numerous samples. One response to this problem is sample-efficient DRL, a research area that encourages learning effective state representations in limited interactions with image-based environments. Previous methods could effectively surpass human performance by training an RL agent using self-supervised learning and data augmentation to learn good state representations from a given interaction. However, most of the existing methods only consider similarity of image observations so that they are hard to capture semantic representations. To address these challenges, we propose spatio-temporal and action-based contrastive representation (STACoRe) learning for sample-efficient DRL. STACoRe performs two contrastive learning to learn proper state representations. One uses the agent's actions as pseudo labels, and the other uses spatio-temporal information. In particular, when performing the action-based contrastive learning, we propose a method that automatically selects data augmentation techniques suitable for each environment for stable model training. We train the model by simultaneously optimizing an action-based contrastive loss function and spatio-temporal contrastive loss functions in an end-to-end manner. This leads to improving sample efficiency for DRL. We use 26 benchmark games in Atari 2600 whose environment interaction is limited to only 100k steps. The experimental results confirm that our method is more sample efficient than existing methods. The code is available at https//github.com/dudwojae/STACoRe.

Assuntos

Benchmarking; Inteligência; Humanos; Reforço Psicológico; Semântica

Palavras-chave

Atari; Automatic data augmentation; End-to-end learning; Reinforcement learning; Spatio-temporal contrastive learning; Supervised contrastive learning

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Benchmarking / Inteligência Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: Neural Netw Assunto da revista: NEUROLOGIA Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google