State-Aggregation Algorithms for Learning Probabilistic Models for Robot Control - Robotics Institute Carnegie Mellon University

State-Aggregation Algorithms for Learning Probabilistic Models for Robot Control

PhD Thesis, Tech. Report, CMU-RI-TR-02-04, Robotics Institute, Carnegie Mellon University, February, 2002

Abstract

This thesis addresses the problem of learning probabilistic representations of dynamical systems with non-linear dynamics and hidden state in the form of partially observable Markov decision process (POMDP) models, with the explicit purpose of using these models for robot control. In contrast to the usual approach to learning probabilistic models, which is based on iterative adjustment of probabilities so as to improve the likelihood of the observed data, the algorithms proposed in this thesis take a di fferent approach - they reduce the learning problem to that of state aggregation by clustering in an embedding space of delayed coordinates, and subsequently estimating transition probabilities between aggregated states (clusters). This approach has close ties to the dominant methods for system identi cation in the fi eld of control engineering, although the characteristics of POMDP models require very di fferent algorithmic solutions. Apart from an extensive investigation of the performance of the proposed algorithms in simulation, they are also applied to two robots built in the course of our experiments. The first one is a di fferential-drive mobile robot with a minimal number of proximity sensors, which has to perform the well-known robotic task of self-localization along the perimeter of its workspace. In comparison to previous neural-net based approaches to the same problem, our algorithm achieved much higher spatial accuracy of localization. The other task is visual servo-control of an under-actuated arm which has to rotate a flying ball attached to it so as to maintain maximal height of rotation with minimal energy expenditure. Even though this problem is intractable for known control engineering methods due to its strongly non-linear dynamics and partially observable state, a control policy obtained by means of policy iteration on a POMDP model learned by our state-aggregation algorithm performed better than several alternative open-loop and closed-loop controllers.

BibTeX

@phdthesis{Nikovski-2002-8387,
author = {Daniel Nikovski},
title = {State-Aggregation Algorithms for Learning Probabilistic Models for Robot Control},
year = {2002},
month = {February},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-02-04},
}