Learning the Structure of Continuous Markov Decision Processes

Metzen, Jan Hendrik

Zitierlink URN

https://nbn-resolving.de/urn:nbn:de:gbv:46-00103656-17

Learning the Structure of Continuous Markov Decision Processes

Veröffentlichungsdatum

2014-02-21

Autoren

Metzen, Jan Hendrik

Betreuer

Kirchner, Frank

Gutachter

Kreowski, Hans-Jörg

Zusammenfassung

There is growing interest in artificial, intelligent agents which can operate autonomously for an extended period of time in complex environments and fulfill a variety of different tasks. Such agents will face different problems during their lifetime which may not be foreseeable at the time of their deployment. Thus, the capacity for lifelong learning of new behaviors is an essential prerequisite for this kind of agents as it enables them to deal with unforeseen situations. However, learning every complex behavior anew from scratch would be cumbersome for the agent. It is more plausible to consider behavior to be modular and let the agent acquire a set of reusable building blocks for behavior, the so-called skills. These skills might, once acquired, facilitate fast learning and adaptation of behavior to new situations. This work focuses on computational approaches for skill acquisition, namely which kind of skills shall be acquired and how to acquire them. The former is commonly denoted as "skill discovery" and the latter as "skill learning". The main contribution of this thesis is a novel incremental skill acquisition approach which is suited for lifelong learning. In this approach, the agent learns incrementally a graph-based representation of a domain and exploits certain properties of this graph such as its bottlenecks for skill discovery. This thesis proposes a novel approach for learning a graph-based representation of continuous domains based on formalizing the problem as a probabilistic generative model. Furthermore, a new incremental agglomerative clustering approach for identifying bottlenecks of such graphs is presented. Thereupon, the thesis proposes a novel intrinsic motivation system which enables an agent to intelligently allocate time between skill discovery and skill learning in developmental settings, where the agent is not constrained by external tasks. The results of this thesis show that the resulting skill acquisition approach is suited for continuous domains and can deal with domain stochasticity and different explorative behavior of the agent. The acquired skills are reusable and versatile and can be used in multi-task and lifelong learning settings in high-dimensional problems.

Schlagwörter

Reinforcement Learning

;

Skill Discovery

;

Skill Acquisition

;

Intrinsic Motivation

;

Hierarchical Reinforcement Learning

;

Graph

Institution

Universität Bremen

Fachbereich

Fachbereich 03: Mathematik/Informatik (FB 03)

Dokumenttyp

Dissertation

Zweitveröffentlichung

Nein

Sprache

Englisch

Dateien

Name

00103656-1.pdf

Size

13.9 MB

Format

Adobe PDF

Checksum

(MD5):1e15975d90657dfd3b1a6b9f86fbe003