A statistical property of multiagent learning based on Markov decision process.

Iwata, Kazunori; Ikeda, Kazushi; Sakai, Hideaki

Iwata, Kazunori; Ikeda, Kazushi; Sakai, Hideaki.

Afiliação

Iwata K; Faculty of Information Sciences, Hiroshima City University, Japan. kiwata@im.hiroshima-cu.ac.jp

IEEE Trans Neural Netw ; 17(4): 829-42, 2006 Jul.

Article em En | MEDLINE | ID: mdl-16856649

ABSTRACT

ABSTRACT

We exhibit an important property called the asymptotic equipartition property (AEP) on empirical sequences in an ergodic multiagent Markov decision process (MDP). Using the AEP which facilitates the analysis of multiagent learning, we give a statistical property of multiagent learning, such as reinforcement learning (RL), near the end of the learning process. We examine the effect of the conditions among the agents on the achievement of a cooperative policy in three different cases blind, visible, and communicable. Also, we derive a bound on the speed with which the empirical sequence converges to the best sequence in probability, so that the multiagent learning yields the best cooperative result.

Assuntos

Aprendizagem; Cadeias de Markov; Modelos Estatísticos

Buscar no Google

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Cadeias de Markov / Modelos Estatísticos / Aprendizagem Tipo de estudo: Health_economic_evaluation / Prognostic_studies / Risk_factors_studies Idioma: En Revista: IEEE Trans Neural Netw Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2006 Tipo de documento: Article País de afiliação: Japão

Buscar no Google

Adicionar na Minha BVS

Imprimir

XML

PubMed Links