Gradient-Based Relational Reinforcement-Learning of Temporally Extended Policies
File version
Author(s)
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Mark Boddy, Maria Fox, and Sylvie Thiebaux
Date
Size
File type(s)
Location
Providence, Rhode-Island, USA
License
Abstract
We consider the problem of computing general policies for decision-theoretic planning problems with temporally extended rewards. We describe a gradient-based approach to relational reinforcement-learning (RRL) of policies for that setting. In particular, the learner optimises its behaviour by acting in a set of problems drawn from a target domain. Our approach is similar to {/em inductive policy selection} because the policies learnt are given in terms of relational control-rules. These rules are generated either (1) by reasoning from a first-order specification of the domain, or (2) more or less arbitrarily according to a taxonomic concept language. To this end the paper contributes a domain definition language for problems with temporally extended rewards, and a taxonomic concept language in which concepts and relations can be temporal. We evaluate our approach in versions of the miconic, logistics and blocks-world planning benchmarks and find that it is able to learn good policies. Our experiments show there is a significant advantage in making temporal concepts available in RRL for planning, whether rewards are temporally extended or not.
Journal Title
Conference Title
Gradient-Based Relational Reinforcement-Learning of Temporally Extended Policies