Complete Cross-Validation for Nearest Neighbor Classifiers

Matthew Mullin and Rahul Sukthankar
Proceedings of the International Conference on Machine Learning, June, 2000.

  • Adobe portable document format (pdf) (104KB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Cross-validation is an established technique for estimating the accuracy of a classifier and is normally performed either using a number of random test/train partitions of the data, or using k-fold cross-validation. We present a technique for calculating the complete cross-validation for nearest-neighbor classifiers: i.e., averaging over all desired test/train partitions of data. This technique is applied to several common classifier variants such as K-nearest-neighbor, stratified data partitioning and arbitrary loss functions. We demonstrate, with complexity analysis and experimental timing results, that the technique can be performed in time comparable to k-fold cross-validation, though in effect it averages an exponential number of trials. We show that the results of complete cross-validation are biased equally compared to subsampling and k-fold cross-validation, and there is some reduction in variance. This algorithm offers significant benefits both in terms of time and accuracy.

machine learning

Associated Center(s) / Consortia: Vision and Autonomous Systems Center

Text Reference
Matthew Mullin and Rahul Sukthankar, "Complete Cross-Validation for Nearest Neighbor Classifiers," Proceedings of the International Conference on Machine Learning, June, 2000.

BibTeX Reference
   author = "Matthew Mullin and Rahul Sukthankar",
   title = "Complete Cross-Validation for Nearest Neighbor Classifiers",
   booktitle = "Proceedings of the International Conference on Machine Learning",
   month = "June",
   year = "2000",