Switching between representations in reinforcement lLearning

Seijen, H. van; Whiteson, S.; Kester, L.

Switching between representations in reinforcement lLearning

article

2010

Seijen, H. van

Whiteson, S.

Kester, L.

This chapter presents and evaluates an on-line representation selection method for factored MDPs. The method addresses a special case of the feature selection problem that only considers certain sub-sets of features, which we call candidate representations. A motivation for the method is that it can potentially deal with problems where other structure learning algorithms are infeasible due to a large degree of the associated dynamic Bayesian network (DBN). Our method uses switch actions to select a representation and uses off-policy updating to improve the policy of representations that were not selected. We demonstrate the validity of the method by showing for a contextual bandit task and a regular MDP that given a feature set containing only a single relevant feature, we can find this feature very efficiently using the switch method. We also show for a contextual bandit task that switching between a set of relevant features and a subset of these features can outperform the performance of both individual representations, since the switch method combines the fast performance increase of the small representation with the high asymptotic performance of the large representation.

TNO Identifier

954180

Repository link

https://resolver.tno.nl/uuid:dcaad490-e763-45e5-8472-4ff5c17ca823

ISSN

1860949X

ISBN

9783642116872

Source

Studies in Computational Intelligence, 281, pp. 65-84.

Editor(s)

Groen, R.
Babuska, F.C.A.

Pages

65-84

Files

To receive the publication files, please send an e-mail request to TNO Repository.

Switching between representations in reinforcement lLearning

Make TNO yours!