Learning and Exploration in Autonomous Agents: adaptive and active prospect
|Other Titles:||Lernen und Exploration in autonomen Agenten: adaptive und aktive Perspektiven||Authors:||Si, Bailu||Supervisor:||Pawelzik, Klaus||1. Expert:||Pawelzik, Klaus||2. Expert:||Burgard, Wolfram||Abstract:||
Place learning and exploration in autonomous agents is important for understanding and building intelligent agents. Experimental and computational studies from the areas of psychology, computer science and neuroscience have achieved great successes. This thesis provides a theoretical investigation into three major problems in place learning and exploration, namely, localization, mapping, and action selection.Two benchmark models are introduced to analyze the basic aspects of place learning and exploration. The checkerboard maze is a stochastic grid-type environment. Exploration performance of an agent is evaluated by the sensory prediction capability of the agent. The evaluation does not require the knowledge of the internal representation of the agent. Furthermore, the checkerboard maze is reduced to the classical multi-armed bandit model in order to analyze the action selection problem in detail. The exploration performance in the multi-armed bandit is quantified by theestimation error of the reward means.Place learning and exploration is modelled as a Partially ObservableMarkov Decision Process (POMDP), and is implemented by a Bayesian network with internal dynamics. The map of the environment is represented by the observation probability of the POMDP,and is stored in the weights of a generative model. The belief stateis tracked by Bayes filtering. The distribution of the sensory inputis predicted by the generative model. Through the minimization ofprediction errors by an on-line adaptive multiplicative gradientdescent rule, the mapping between locations and sensory inputs islearned.In the $n$-armed bandit, the optimal exploration policy in the senseof total mean squared error is proved to be gain-maximizationexploration. The sample complexity of the proposed idealgain-maximization exploration policy can be $O(n)$ as small as thecounter-based and the error-based policies, both in the sense oftotal mean squared error and expected $0/1$ loss. For realisticsituations where the reward variances are unknown, a realisticgain-maximization exploration policy is derived using upperconfidence limits of the reward variances.Gain-maximization is a general principle unifying a wide range ofexploration strategies including counter-based and error-basedpolicies. By generalizing the total mean squared error, thecounter-based and error-based exploration policies are shown to result from the gain-maximization principle with respect todifferent variants of the general objective measure.Formulating the exploration in reward maximization as the learningof the differences between the reward means, we derivegain-maximization selection policies both in the ideal case and forrealistic situations. Through a simple linear trade-off, gain-basedreward-maximization policies achieve smaller regret on fixed datasets, as compared to classical strategies like interval estimationmethods, $\epsilon$-greedy strategy, and upper confidence boundpolicy.The action selection in the full place learning problem isimplemented by a network maximizing the agent's intrinsic rewards.Action values are learned in a similar way as $Q$-learning. Based onthe results of local gain adaptation and multi-armed bandit,two gainfunctions are defined as the agent's curiosity. Moreover, anestimation of a global exploration performance measure is found tomodel competence motivation. Active exploration directed by theproposed intrinsic reward functions not only outperforms random exploration, but also produces curiosity behavior which is observed in natural agents
|Keywords:||Autonomous Agents, Exploration, Active Learning, Reinforcement Learning, Gain maximization, Robot Exploration, Place Learning, Localization, Mapping||Issue Date:||22-Mar-2007||URN:||urn:nbn:de:gbv:46-diss000107482||Institution:||Universität Bremen||Faculty:||FB1 Physik/Elektrotechnik|
|Appears in Collections:||Dissertationen|
checked on Sep 21, 2020
Items in Media are protected by copyright, with all rights reserved, unless otherwise indicated.