Bayesian Networks for Lossless Dataset Compression - Robotics Institute Carnegie Mellon University

Bayesian Networks for Lossless Dataset Compression

Scott Davies and Andrew Moore
Conference Paper, Proceedings of 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '99), pp. 387 - 391, August, 1999

Abstract

The recent explosion in research on probabilistic data mining algorithms such as Bayesian networks has been focused primarily on their use in diagnostics, prediction and efficient inference. In this paper, we examine the use of Bayesian networks for a different purpose: lossless compression of large datasets. We present algorithms for automatically learning Bayesian networks and new structures called \Hu man networks" that model statistical relationships in the datasets, and algorithms for using these models to then compress the datasets. These algorithms often achieve significantly better compression ratios than achieved with common dictionary-based algorithms such those used by programs like ZIP.

BibTeX

@conference{Davies-1999-16687,
author = {Scott Davies and Andrew Moore},
title = {Bayesian Networks for Lossless Dataset Compression},
booktitle = {Proceedings of 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '99)},
year = {1999},
month = {August},
pages = {387 - 391},
}