Learning and Exploration in Autonomous Agents: adaptive and active prospect

Si, Bailu

Zitierlink URN

https://nbn-resolving.de/urn:nbn:de:gbv:46-diss000107482

Learning and Exploration in Autonomous Agents: adaptive and active prospect

Veröffentlichungsdatum

2007-03-22

Autoren

Si, Bailu

Betreuer

Pawelzik, Klaus

Gutachter

Burgard, Wolfram

Zusammenfassung

Place learning and exploration in autonomous agents is important for understanding and building intelligent agents. Experimental and computational studies from the areas of psychology, computer science and neuroscience have achieved great successes. This thesis provides a theoretical investigation into three major problems in place learning and exploration, namely, localization, mapping, and action selection.Two benchmark models are introduced to analyze the basic aspects of place learning and exploration. The checkerboard maze is a stochastic grid-type environment. Exploration performance of an agent is evaluated by the sensory prediction capability of the agent. The evaluation does not require the knowledge of the internal representation of the agent. Furthermore, the checkerboard maze is reduced to the classical multi-armed bandit model in order to analyze the action selection problem in detail. The exploration performance in the multi-armed bandit is quantified by theestimation error of the reward means.Place learning and exploration is modelled as a Partially ObservableMarkov Decision Process (POMDP), and is implemented by a Bayesian network with internal dynamics. The map of the environment is represented by the observation probability of the POMDP,and is stored in the weights of a generative model. The belief stateis tracked by Bayes filtering. The distribution of the sensory inputis predicted by the generative model. Through the minimization ofprediction errors by an on-line adaptive multiplicative gradientdescent rule, the mapping between locations and sensory inputs islearned.In the $n$-armed bandit, the optimal exploration policy in the senseof total mean squared error is proved to be gain-maximizationexploration. The sample complexity of the proposed idealgain-maximization exploration policy can be $O(n)$ as small as thecounter-based and the error-based policies, both in the sense oftotal mean squared error and expected $0/1$ loss. For realisticsituations where the reward variances are unknown, a realisticgain-maximization exploration policy is derived using upperconfidence limits of the reward variances.Gain-maximization is a general principle unifying a wide range ofexploration strategies including counter-based and error-basedpolicies. By generalizing the total mean squared error, thecounter-based and error-based exploration policies are shown to result from the gain-maximization principle with respect todifferent variants of the general objective measure.Formulating the exploration in reward maximization as the learningof the differences between the reward means, we derivegain-maximization selection policies both in the ideal case and forrealistic situations. Through a simple linear trade-off, gain-basedreward-maximization policies achieve smaller regret on fixed datasets, as compared to classical strategies like interval estimationmethods, $\epsilon$-greedy strategy, and upper confidence boundpolicy.The action selection in the full place learning problem isimplemented by a network maximizing the agent's intrinsic rewards.Action values are learned in a similar way as $Q$-learning. Based onthe results of local gain adaptation and multi-armed bandit,two gainfunctions are defined as the agent's curiosity. Moreover, anestimation of a global exploration performance measure is found tomodel competence motivation. Active exploration directed by theproposed intrinsic reward functions not only outperforms random exploration, but also produces curiosity behavior which is observed in natural agents

Schlagwörter

Autonomous Agents

;

Exploration

;

Active Learning

;

Reinforcement Learning

;

Gain maximization

;

Robot Exploration

;

Place Learning

;

Localization

;

Mapping

Institution

Universität Bremen

Fachbereich

Fachbereich 01: Physik/Elektrotechnik (FB 01)

Dokumenttyp

Dissertation

Zweitveröffentlichung

Nein

Sprache

Englisch

Dateien

Name

00010748.pdf

Size

4.97 MB

Format

Adobe PDF

Checksum

(MD5):eacc8da981c41ba9314c139607f4d233