Learning Reactive Flight Control Policies: From LIDAR Measurements to Actions

Master's Thesis, Tech. Report, CMU-RI-TR-18-58, Robotics Institute, Carnegie Mellon University, August, 2018

View Publication

Abstract

The end goal of a quadrotor reactive flight control pipeline is to provide control outputs, often in the form of roll and pitch commands, to safely navigate the robot through an environment based on sensor inputs. Classical state estimation and control algorithms break down this problem by first estimating the robot’s velocity and then computing a roll and pitch command based on that velocity. However, this approach is error-prone in geometrically degenerate environments which do not provide enough information to accurately estimate vehicle velocity. Recent work has shown that learned end-to-end policies can unify obstacle detection and planning systems for vision based systems. This work applies similar methods to learn an end-to-end control policy for a lidar equipped flying robot which replaces both state estimator and controller while leaving long term planning to traditional planning algorithms. Specifically, this work demonstrates the feasibility of training such a policy using imitation learning and RNNs to map directly from lidar range measurements to robot accelerations in realistic simulation environments without an explicit state estimate. The policy is fully trained on simulated data using procedurally generated environments, achieving an average of 1.7km mean distance between collisions. Additionally, various real world flight tests through tunnel and tunnel-like environments demonstrate that a policy learned in simulation can successfully control a real quadcopter.

BibTeX

@mastersthesis{Zeng-2018-107556,
author = {Sam Zeng},
title = {Learning Reactive Flight Control Policies: From LIDAR Measurements to Actions},
year = {2018},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-18-58},
keywords = {Reactive Control, Imitation Learning, Reinforcement Learning, Field Robotics, Simulation, Environment Generation, Tunnels},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.