Subgradient Methods for Maximum Margin Structured Learning

Nathan Ratliff, J. Andrew (Drew) Bagnell, and Martin Zinkevich

Workshop Paper, ICML '06 Workshop on Learning in Structured Output Spaces, July, 2006

View Publication

Abstract

Maximum margin structured learning (MMSL) has recently gained recognition within the machine learning community as a tractable method for large scale learning. However, most current methods are limited in terms of scalability, convergence, or memory requirements. The original Structured SMO method proposed in (Taskar et al., 2003) is slow to converge, particularly for Markov networks of even medium treewidth. Similarly, dual exponentiated gradient techniques suffer from sublinear convergence as well as often large memory requirements. Recently, (Taskar et al., 2006) have looked into saddle-point methods for optimization and have succeeded in efficiently solving several problems that would have otherwise had intractable memory requirements. We propose an alternative gradient based approach using a regularized risk formulation of MMSL derived by placing the constraints into the objective to create a convex function in w. This objective is then optimized by a direct generalization of gradient descent, popular in convex optimization, called the subgradient method (Shor, 1985). The abundance of literature on subgradient methods makes this algorithm a decidedly convenient choice. In this case, it is well known that the subgradient method is guaranteed linear convergence when the stepsize is chosen to be constant. Furthermore, this algorithm becomes the well-studied Greedy Projection algorithm of (Zinkevich, 2003) in the online setting. Using tools developed in (Hazan et al., 2006), we can show that the risk of this online algorithm with respect to the prediction loss grows only sublinearly in time. Perhaps more importantly, the implementation of this algorithm is simple and has intuitive appeal since an integral part of the computation comes from running the inference algorithm being trained in the inner loop. In what follows, we review the basic formulation of maximum margin structured learning as a convex programming problem before deriving the convex objective and showing how its subgradients can be computed utilizing the specialized inference algorithm inherent to the problem. We finish with theoretical guarantees and some experimental results in two domains: sequence labeling for optical character recognition; and imitation learning for path planning in mobile robot navigation. The former problem is well known to MMSL, but the latter is new to this domain. Indeed, although there is a tractable polynomial sized quadratic programming representation for the problem (Ratliff et al., 2006), solving it directly using one of the previously proposed methods would be intractable practically for reasons similar to those that arise in directly solving the linear programming formulation of Markov Decision Processes.

BibTeX

@workshop{Ratliff-2006-9440,
author = {Nathan Ratliff and J. Andrew (Drew) Bagnell and Martin Zinkevich},
title = {Subgradient Methods for Maximum Margin Structured Learning},
booktitle = {Proceedings of ICML '06 Workshop on Learning in Structured Output Spaces},
year = {2006},
month = {July},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.