Scalable and robust group discovery on large transactional data - Robotics Institute Carnegie Mellon University

Scalable and robust group discovery on large transactional data

Pak Yan Choi, Andrew Moore, and Jeremy Martin Kubica
Tech. Report, CMU-RI-TR-05-60, Robotics Institute, Carnegie Mellon University, December, 2005

Abstract

The need for time-critical analysis and understanding of the underlying group structure from transactional data has been growing in domains such as law enforcement and customs. Kubica et al. (2003) proposed k-groups, an algorithm based on probabilistic generative model for discovering underlying groups in data. Even though k-groups is reported to be signficantly faster than its predecessor GDA (Kubica et al., 2002), k-groups is too slow and memory-intensive for large data in practice. This paper presents XGDA, a framework for scalable and robust group discovery. Evaluation of the performances of XGDA and k-groups shows that XGDA can handle extremely large datasets in reasonable time and yields more robust solutions than k-groups.

BibTeX

@techreport{Choi-2005-9362,
author = {Pak Yan Choi and Andrew Moore and Jeremy Martin Kubica},
title = {Scalable and robust group discovery on large transactional data},
year = {2005},
month = {December},
institute = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-05-60},
}