PhD Thesis Proposal
Carnegie Mellon University
12:30 pm - 1:30 pm
A long standing goal of robotics research is to create algorithms that can automatically learn complex control strategies from scratch. Part of the challenge of applying such algorithms to robots is the choice of representation. While RL algorithms have been successfully applied to many robotics tasks such as Ball-in-a-Cup and various RoboCup soccer domains, the algorithms still require large amounts of training data and training time. Choosing an appropriate representation for the state space, action space, and policy can go a long way towards reducing the required training time and required training data.
This thesis focuses on how choices of representation for states, actions, and policies affect training time and sample complexity for robotic learning tasks. In particular we focus on two main areas:
1. Simultaneously learning with different representations
2. Transferrable Representations
The first explores how strengths in one representation can make-up for weaknesses in another. For example, while it is possible to learn with high-dimensional sensors such as cameras, doing so usually requires far more training samples. Alternatively, one can carefully curate the state information into a set of low-dimensional features that are easier to learn with, but may be difficult to compute on a real robot. In previous work we demonstrated that by learning with both representations during training, and then using only the image based policy at test time, learning could be sped-up significantly. We demonstrated this on the Ball-in-a-Cup task performed by a robot arm, which trained from scratch in the real-world and was able to successfully learn a robust policy.
The second area explores how to speed-up and improve transfer of policies across environment changes. If a policy can be transferred quickly, then a large up-front training cost can be amortized across many environments. In this thesis we design are presentation to make transfer easy. In prior work, we have looked at how to map a standard MDP state and action space to a state and action space made of multi-dimensional tensors. By using a special network architecture, we were able to train policies on both single and multi-agent domains and transfer the policies as the environment size changed, the number of agents changed, and the number of other objects in the environment changed.
For the remainder of the thesis, we will expand both of these areas. Applying our simultaneous representation learning to more domains to further test the results. We will also further explore our transferrable representations, testing things such as how the representation works when there are noisy observations. Finally, we will attempt to test a simple combination of our approaches on a RoboCup domain. Low-level skills will be learned with our simultaneous representations technique and the high-level team strategy with our transferrable representation.
Thesis Committee Members:
Manuela Veloso, Chair
Martin Riedmiller, Google DeepMind