/A sample-efficient black-box optimizer to train policies for human-in-the- loop systems with user preferences

A sample-efficient black-box optimizer to train policies for human-in-the- loop systems with user preferences

Nitish Thatte, Helei Duan and Hartmut Geyer
Journal Article, Robotics and Automation Letters, Vol. 2, No. 2, pp. 993-1000, January, 2017

Download Publication (PDF)

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract

We present a new algorithm for optimizing control policies for human-in-the-loop systems based on qualitative pref- erence feedback. This method is especially applicable to systems such as lower-limb prostheses and exoskeletons for which it is difficult to define an objective function, hard to identify a model, and costly to repeat hardware experiments. To solve these problems, we combine and extend an algorithm for learning from preferences and the Predictive Entropy Search Bayesian optimization method. The resulting algorithm, Predictive Entropy Search with Preferences (PES-P), solicits preferences between pairs of control parameter sets that optimally reduce the un- certainty in the distribution of objective function optima with the least number of experiments. We find that this algorithm outperforms the expected improvement method (EI), and random comparisons via Latin hypercubes (LH) in three simulation tests that range from optimizing randomly generated functions to tuning control parameters of linear systems and of a walking model. Furthermore, we find in a pilot study on the control of a robotic transfemoral prosthesis that PES-P finds good control parameters quickly and more consistently than EI or LH given real user preferences. The results suggest the proposed algorithm can help engineers optimize certain robotic systems more accurately, efficiently, and consistently.

BibTeX Reference
@article{Thatte-2017-102636,
author = {Nitish Thatte and Helei Duan and Hartmut Geyer},
title = {A sample-efficient black-box optimizer to train policies for human-in-the- loop systems with user preferences},
journal = {Robotics and Automation Letters},
year = {2017},
month = {January},
volume = {2},
number = {2},
pages = {993-1000},
}
2018-01-24T09:14:50+00:00