Gradient-Based Relational Reinforcement-Learning of Temporally Extended Policies

No Thumbnail Available
File version
Author(s)
Gretton, Charles
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)

Mark Boddy, Maria Fox, and Sylvie Thiebaux

Date
2007
Size
File type(s)
Location

Providence, Rhode-Island, USA

License
Abstract

We consider the problem of computing general policies for decision-theoretic planning problems with temporally extended rewards. We describe a gradient-based approach to relational reinforcement-learning (RRL) of policies for that setting. In particular, the learner optimises its behaviour by acting in a set of problems drawn from a target domain. Our approach is similar to {/em inductive policy selection} because the policies learnt are given in terms of relational control-rules. These rules are generated either (1) by reasoning from a first-order specification of the domain, or (2) more or less arbitrarily according to a taxonomic concept language. To this end the paper contributes a domain definition language for problems with temporally extended rewards, and a taxonomic concept language in which concepts and relations can be temporal. We evaluate our approach in versions of the miconic, logistics and blocks-world planning benchmarks and find that it is able to learn good policies. Our experiments show there is a significant advantage in making temporal concepts available in RRL for planning, whether rewards are temporally extended or not.

Journal Title
Conference Title

Gradient-Based Relational Reinforcement-Learning of Temporally Extended Policies

Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
DOI
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Persistent link to this record
Citation