Pesquisa | Portal Regional da BVS

Extending the Peak Bandwidth of Parameters for Softmax Selection in Reinforcement Learning.

Iwata, Kazunori.

IEEE Trans Neural Netw Learn Syst ; 28(8): 1865-1877, 2017 08.

Artigo em Inglês | MEDLINE | ID: mdl-27187974

RESUMO

Softmax selection is one of the most popular methods for action selection in reinforcement learning. Although various recently proposed methods may be more effective with full parameter tuning, implementing a complicated method that requires the tuning of many parameters can be difficult. Thus, softmax selection is still worth revisiting, considering the cost savings of its implementation and tuning. In fact, this method works adequately in practice with only one parameter appropriately set for the environment. The aim of this paper is to improve the variable setting of this method to extend the bandwidth of good parameters, thereby reducing the cost of implementation and parameter tuning. To achieve this, we take advantage of the asymptotic equipartition property in a Markov decision process to extend the peak bandwidth of softmax selection. Using a variety of episodic tasks, we show that our setting is effective in extending the bandwidth and that it yields a better policy in terms of stability. The bandwidth is quantitatively assessed in a series of statistical tests.

A phenotypic drug discovery study on thienodiazepine derivatives as inhibitors of T cell proliferation induced by CD28 co-stimulation leads to the discovery of a first bromodomain inhibitor.

Endo, Junichi; Hikawa, Hidemasa; Hamada, Maiko; Ishibuchi, Seigo; Fujie, Naoto; Sugiyama, Naoki; Tanaka, Minoru; Kobayashi, Haruhito; Sugahara, Kunio; Oshita, Koichi; Iwata, Kazunori; Ooike, Shinsuke; Murata, Meguru; Sumichika, Hiroshi; Chiba, Kenji; Adachi, Kunitomo.

Bioorg Med Chem Lett ; 26(5): 1365-70, 2016 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-26869194

RESUMO

A phenotypic screening of thienodiazepines derived from a hit compound found through a binding assay targeting co-stimulatory molecules on T cells and antigen presenting cells successfully led to the discovery of a thienotriazolodiazepine compound (7f) possessing potent immunosuppressive activity. A chemical biology approach has succeeded in revealing that 7f is a first inhibitor of epigenetic bromodomain-containing proteins. 7f is expected to become an anti-cancer agent as well as an immunosuppressive agent.

Assuntos

Antineoplásicos/farmacologia , Azepinas/farmacologia , Antígenos CD28/metabolismo , Descoberta de Drogas , Imunossupressores/farmacologia , Linfócitos T/citologia , Linfócitos T/efeitos dos fármacos , Antineoplásicos/síntese química , Antineoplásicos/química , Azepinas/síntese química , Azepinas/química , Antígenos CD28/imunologia , Linhagem Celular Tumoral , Proliferação de Células/efeitos dos fármacos , Sobrevivência Celular/efeitos dos fármacos , Relação Dose-Resposta a Droga , Ensaios de Seleção de Medicamentos Antitumorais , Histona Acetiltransferases , Chaperonas de Histonas , Humanos , Imunossupressores/síntese química , Imunossupressores/química , Estrutura Molecular , Proteínas Nucleares/antagonistas & inibidores , Proteínas Nucleares/metabolismo , Fenótipo , Relação Estrutura-Atividade , Linfócitos T/imunologia , Linfócitos T/metabolismo

An information-theoretic analysis of return maximization in reinforcement learning.

Iwata, Kazunori.

Neural Netw ; 24(10): 1074-81, 2011 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-21665429

RESUMO

We present a general analysis of return maximization in reinforcement learning. This analysis does not require assumptions of Markovianity, stationarity, and ergodicity for the stochastic sequential decision processes of reinforcement learning. Instead, our analysis assumes the asymptotic equipartition property fundamental to information theory, providing a substantially different view from that in the literature. As our main results, we show that return maximization is achieved by the overlap of typical and best sequence sets, and we present a class of stochastic sequential decision processes with the necessary condition for return maximization. We also describe several examples of best sequences in terms of return maximization in the class of stochastic sequential decision processes, which satisfy the necessary condition.

Assuntos

Inteligência Artificial , Teoria da Informação , Redes Neurais de Computação , Reforço Psicológico , Técnicas de Apoio para a Decisão , Cadeias de Markov , Aprendizagem por Probabilidade , Processos Estocásticos

A redundancy-based measure of dissimilarity among probability distributions for hierarchical clustering criteria.

Iwata, Kazunori; Hayashi, Akira.

IEEE Trans Pattern Anal Mach Intell ; 30(1): 76-88, 2008 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-18000326

RESUMO

We introduce novel dissimilarity into a probabilistic clustering task to properly measure dissimilarity among multiple clusters when each cluster is characterized by a subpopulation in the mixture model. This measure of dissimilarity is called redundancy-based dissimilarity among probability distributions. From aspects of both source coding and a statistical hypothesis test, we shed light on several of the theoretical reasons for the redundancy-based dissimilarity among probability distributions being a reasonable measure of dissimilarity among clusters. We also elucidate a principle in common for the measures of redundancy-based dissimilarity and Ward's method in terms of hierarchical clustering criteria. Moreover, we show several related theorems that are significant for clustering tasks. In the experiments, properties of the measure of redundancy-based dissimilarity are examined in comparison with several other measures.

Assuntos

Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Simulação por Computador , Interpretação Estatística de Dados , Aumento da Imagem/métodos , Imageamento Tridimensional/métodos , Modelos Estatísticos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

N-(3-oxo-acyl)homoserine lactones signal cell activation through a mechanism distinct from the canonical pathogen-associated molecular pattern recognition receptor pathways.

Kravchenko, Vladimir V; Kaufmann, Gunnar F; Mathison, John C; Scott, David A; Katz, Alexander Z; Wood, Malcolm R; Brogan, Andrew P; Lehmann, Mandy; Mee, Jenny M; Iwata, Kazunori; Pan, Qilin; Fearns, Colleen; Knaus, Ulla G; Meijler, Michael M; Janda, Kim D; Ulevitch, Richard J.

J Biol Chem ; 281(39): 28822-30, 2006 Sep 29.

Artigo em Inglês | MEDLINE | ID: mdl-16893899

RESUMO

Innate immune system receptors function as sensors of infection and trigger the immune responses through ligand-specific signaling pathways. These ligands are pathogen-associated products, such as components of bacterial walls and viral nuclear acids. A common response to such ligands is the activation of mitogen-activated protein kinase p38, whereas double-stranded viral RNA additionally induces the phosphorylation of eukaryotic translation initiation factor 2alpha (eIF2alpha). Here we have shown that p38 and eIF2alpha phosphorylation represent two biochemical markers of the effects induced by N-(3-oxo-acyl)homoserine lactones, the secreted products of a number of Gram-negative bacteria, including the human opportunistic pathogen Pseudomonas aeruginosa. Furthermore, N-(3-oxo-dodecanoyl)homoserine lactone induced distension of mitochondria and the endoplasmic reticulum as well as c-jun gene transcription. These effects occurred in a wide variety of cell types including alveolar macrophages and bronchial epithelial cells, requiring the structural integrity of the lactone ring motif and its natural stereochemistry. These findings suggest that N-(3-oxo-acyl)homoserine lactones might be recognized by receptors of the innate immune system. However, we provide evidence that N-(3-oxo-dodecanoyl)homoserine lactone-mediated signaling does not require the presence of the canonical innate immune system receptors, Toll-like receptors, or two members of the NLR/Nod/Caterpillar family, Nod1 and Nod2. These data offer a new understanding of the effects of N-(3-oxo-dodecanoyl)homoserine lactone on host cells and its role in persistent airway infections caused by P. aeruginosa.

Assuntos

4-Butirolactona/análogos & derivados , Células da Medula Óssea/microbiologia , Regulação da Expressão Gênica , Macrófagos/microbiologia , 4-Butirolactona/química , 4-Butirolactona/fisiologia , Motivos de Aminoácidos , Animais , Células da Medula Óssea/citologia , Macrófagos/metabolismo , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Knockout , Fosforilação , Pseudomonas aeruginosa/metabolismo , RNA Viral/metabolismo , Transdução de Sinais

A statistical property of multiagent learning based on Markov decision process.

Iwata, Kazunori; Ikeda, Kazushi; Sakai, Hideaki.

IEEE Trans Neural Netw ; 17(4): 829-42, 2006 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-16856649

RESUMO

We exhibit an important property called the asymptotic equipartition property (AEP) on empirical sequences in an ergodic multiagent Markov decision process (MDP). Using the AEP which facilitates the analysis of multiagent learning, we give a statistical property of multiagent learning, such as reinforcement learning (RL), near the end of the learning process. We examine the effect of the conditions among the agents on the achievement of a cooperative policy in three different cases: blind, visible, and communicable. Also, we derive a bound on the speed with which the empirical sequence converges to the best sequence in probability, so that the multiagent learning yields the best cooperative result.

Assuntos

Aprendizagem , Cadeias de Markov , Modelos Estatísticos

The asymptotic equipartition property in reinforcement learning and its relation to return maximization.

Iwata, Kazunori; Ikeda, Kazushi; Sakai, Hideaki.

Neural Netw ; 19(1): 62-75, 2006 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-16202563

RESUMO

We discuss an important property called the asymptotic equipartition property on empirical sequences in reinforcement learning. This states that the typical set of empirical sequences has probability nearly one, that all elements in the typical set are nearly equi-probable, and that the number of elements in the typical set is an exponential function of the sum of conditional entropies if the number of time steps is sufficiently large. The sum is referred to as stochastic complexity. Using the property we elucidate the fact that the return maximization depends on two factors, the stochastic complexity and a quantity depending on the parameters of environment. Here, the return maximization means that the best sequences in terms of expected return have probability one. We also examine the sensitivity of stochastic complexity, which is a qualitative guide in tuning the parameters of action-selection strategy, and show a sufficient condition for return maximization in probability.

Assuntos

Teoria da Informação , Aprendizagem/fisiologia , Reforço Psicológico , Algoritmos , Meio Ambiente , Modelos Neurológicos , Modelos Estatísticos , Processos Estocásticos

A new criterion using information gain for action selection strategy in reinforcement learning.

Iwata, Kazunori; Ikeda, Kazushi; Sakai, Hideaki.

IEEE Trans Neural Netw ; 15(4): 792-9, 2004 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-15461073

RESUMO

In this paper, we regard the sequence of returns as outputs from a parametric compound source. Utilizing the fact that the coding rate of the source shows the amount of information about the return, we describe l-learning algorithms based on the predictive coding idea for estimating an expected information gain concerning future information and give a convergence proof of the information gain. Using the information gain, we propose the ratio w of return loss to information gain as a new criterion to be used in probabilistic action-selection strategies. In experimental results, we found that our w-based strategy performs well compared with the conventional Q-based strategy.

Assuntos

Inteligência Artificial , Técnicas de Apoio para a Decisão , Armazenamento e Recuperação da Informação/métodos , Teoria da Informação , Modelos Estatísticos , Redes Neurais de Computação , Aprendizagem por Probabilidade , Algoritmos , Simulação por Computador , Reforço Psicológico

Peculiar chiral discrimination of bovine serum albumin to (+/-)-N-dansyl-norleucine.

Abe, Yoshihiro; Yasuoka, Shingo; Shoji, Tomoko; Sugata, Setsuro; Hattori, Kenji; Iwata, Kazunori; Suzuki, Hiroshi.

Anal Sci ; 18(7): 823-5, 2002 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-12137381

Assuntos

Compostos de Dansil/análise , Compostos de Dansil/química , Norleucina/análise , Norleucina/química , Soroalbumina Bovina/química , Animais , Ligação Competitiva/efeitos dos fármacos , Bovinos , Compostos de Dansil/metabolismo , Fluorescência , Concentração de Íons de Hidrogênio , Estrutura Molecular , Norleucina/análogos & derivados , Norleucina/metabolismo , Ligação Proteica/efeitos dos fármacos , Soroalbumina Bovina/metabolismo , Estereoisomerismo , Varfarina/farmacologia

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA