Models for Learning Spatial Interactions in Natural Images for Context-Based Classification

Sanjiv Kumar
tech. report CMU-CS-05-28, Robotics Institute, Carnegie Mellon University, August, 2005


Download
  • Adobe portable document format (pdf) (5MB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract
Classification of various image components (pixels, regions and objects) in meaningful categories is a challenging task due to ambiguities inherent to visual data. Natural images exhibit strong contextual dependencies in the form of spatial interactions among components. For example, neighboring pixels tend to have similar class labels, and different parts of an object are related through geometric constraints. Going beyond these, different regions e.g., sky and water, or objects e.g., monitor and keyboard appear in restricted spatial configurations. Modeling these interactions is crucial to achieve good classification accuracy.

In this thesis, we present discriminative field models that capture spatial interactions in images in a discriminative framework based on the concept of Conditional Random Fields proposed by Lafferty et al. The discriminative fields offer several advantages over the Markov Random Fields (MRFs) popularly used in computer vision. First, they allow to capture arbitrary dependencies in the observed data by relaxing the restrictive assumption of conditional independence generally made in MRFs for tractability. Second, the interaction in labels in discriminative fields is based on the observed data, instead of being fixed a priori as in MRFs. This is critical to incorporate different types of context in images within a single framework. Finally, the discriminative fields derive their classification power by exploiting probabilistic discriminative models instead of the generative models used in MRFs.

Since the graphs induced by the discriminative fields may have arbitrary topology, exact maximum likelihood parameter learning may not be feasible. We present an approach which approximates the gradients of the likelihood with simple piecewise constant functions constructed using inference techniques. To exploit different levels of contextual information in images, a two-layer hierarchical formulation is also described. It encodes both short-range interactions (e.g., pixelwise label smoothing) as well as long-range interactions (e.g., relative configurations of objects or regions) in a tractable manner. The models proposed in this thesis are general enough to be applied to several challenging computer vision tasks such as contextual object detection, semantic scene segmentation, texture recognition, and image denoising seamlessly within a single framework.


Notes
Number of pages: 183

Text Reference
Sanjiv Kumar, "Models for Learning Spatial Interactions in Natural Images for Context-Based Classification," tech. report CMU-CS-05-28, Robotics Institute, Carnegie Mellon University, August, 2005

BibTeX Reference
@techreport{Kumar_2005_5143,
   author = "Sanjiv Kumar",
   title = "Models for Learning Spatial Interactions in Natural Images for Context-Based Classification",
   booktitle = "",
   institution = "Robotics Institute",
   month = "August",
   year = "2005",
   number= "CMU-CS-05-28",
   address= "Pittsburgh, PA",
}