Learning the Structure of Continuous Markov Decision Processes
|Other Titles:||Erlernen der Struktur von kontinuierlichen Markow-Entscheidungsprozessen||Authors:||Metzen, Jan Hendrik||Supervisor:||Kirchner, Frank||1. Expert:||Kirchner, Frank||2. Expert:||Kreowski, Hans-Jörg||Abstract:||
There is growing interest in artificial, intelligent agents which can operate autonomously for an extended period of time in complex environments and fulfill a variety of different tasks. Such agents will face different problems during their lifetime which may not be foreseeable at the time of their deployment. Thus, the capacity for lifelong learning of new behaviors is an essential prerequisite for this kind of agents as it enables them to deal with unforeseen situations. However, learning every complex behavior anew from scratch would be cumbersome for the agent. It is more plausible to consider behavior to be modular and let the agent acquire a set of reusable building blocks for behavior, the so-called skills. These skills might, once acquired, facilitate fast learning and adaptation of behavior to new situations. This work focuses on computational approaches for skill acquisition, namely which kind of skills shall be acquired and how to acquire them. The former is commonly denoted as "skill discovery" and the latter as "skill learning". The main contribution of this thesis is a novel incremental skill acquisition approach which is suited for lifelong learning. In this approach, the agent learns incrementally a graph-based representation of a domain and exploits certain properties of this graph such as its bottlenecks for skill discovery. This thesis proposes a novel approach for learning a graph-based representation of continuous domains based on formalizing the problem as a probabilistic generative model. Furthermore, a new incremental agglomerative clustering approach for identifying bottlenecks of such graphs is presented. Thereupon, the thesis proposes a novel intrinsic motivation system which enables an agent to intelligently allocate time between skill discovery and skill learning in developmental settings, where the agent is not constrained by external tasks. The results of this thesis show that the resulting skill acquisition approach is suited for continuous domains and can deal with domain stochasticity and different explorative behavior of the agent. The acquired skills are reusable and versatile and can be used in multi-task and lifelong learning settings in high-dimensional problems.
|Keywords:||Reinforcement Learning; Skill Discovery; Skill Acquisition; Intrinsic Motivation; Hierarchical Reinforcement Learning; Graph||Issue Date:||21-Feb-2014||Type:||Dissertation||URN:||urn:nbn:de:gbv:46-00103656-17||Institution:||Universität Bremen||Faculty:||FB3 Mathematik/Informatik|
|Appears in Collections:||Dissertationen|
checked on Oct 18, 2021
checked on Oct 18, 2021
Items in Media are protected by copyright, with all rights reserved, unless otherwise indicated.