Complete Cross-Validation for Nearest Neighbor Classifiers

Conference Paper, Proceedings of (ICML) International Conference on Machine Learning, pp. 639 - 646, June, 2000

View Publication

Abstract

Cross-validation is an established technique for estimating the accuracy of a classifier and is normally performed either using a number of random test/train partitions of the data, or using k-fold cross-validation. We present a technique for calculating the complete cross-validation for nearest-neighbor classifiers: i.e., averaging over all desired test/train partitions of data. This technique is applied to several common classifier variants such as K-nearest-neighbor, stratified data partitioning and arbitrary loss functions. We demonstrate, with complexity analysis and experimental timing results, that the technique can be performed in time comparable to k-fold cross-validation, though in effect it averages an exponential number of trials. We show that the results of complete cross-validation are biased equally compared to subsampling and k-fold cross-validation, and there is some reduction in variance. This algorithm offers significant benefits both in terms of time and accuracy.

BibTeX

@conference{Mullin-2000-8050,
author = {Matthew Mullin and Rahul Sukthankar},
title = {Complete Cross-Validation for Nearest Neighbor Classifiers},
booktitle = {Proceedings of (ICML) International Conference on Machine Learning},
year = {2000},
month = {June},
pages = {639 - 646},
keywords = {machine learning},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.