Loading Events

MSR Thesis Defense

May

18
Thu
Fan Yang MSR Student Robotics Institute,
Carnegie Mellon University
Thursday, May 18
3:00 pm to 4:00 pm
NSH 3305
MSR Thesis Talk: Fan Yang

Title: Exploring Safe Reinforcement Learning for Sequential Decision Making

 

Abstract:

Safe Reinforcement Learning (RL) focuses on the problem of training a policy to maximize the reward while ensuring safety. It is an important step towards applying RL to safety-critical real-world applications. However, safe RL is challenging due to the trade-off between the two objectives of maximizing the reward and satisfying the safety constraints, which could lead to unstable training and over-conservative behaviors.

 

In this thesis, we propose two methods of solving the issues mentioned above in safe RL:

(1) We propose Self-paced Safe Reinforcement Learning which combines a self-paced curriculum on the safety objective with a base safe RL algorithm PPO-Lagrangian. During training, the policy starts with easy safety constraints and gradually increases the difficulty of the constraints until the desired constraints are satisfied. We evaluate our algorithm on the Safety Gym benchmark and demonstrate that the curriculum helps the underlying Safe RL algorithm to avoid local optima and improves the performance for both reward and safety objectives.

(2) We propose to learn a policy in a modified MDP in which the safety constraints are embedded into the action space. In this “safety-embedded MDP”, the output of the RL agent is transformed into a sequence of actions using a trajectory optimizer that is guaranteed to be safe, under assumption that the robot is currently in a safe and quasi-static configuration.

We evaluate our method in the Safety Gym benchmark and show that we achieve significantly higher rewards and fewer safety violations during training than previous work; further, we have  no safety violations during inference.

We also evaluate our method on a real robot box-pushing task and demonstrate that our method can be safely deployed in the real world.

Committee:

David Held(advisor)

Guanya Shi

Jianren Wang