Search

Navigator: RI | Publications | Tractable Group Detection on Large Link Data Sets

Graphics enhanced version of this site

Tractable Group Detection on Large Link Data Sets
J.M. Kubica, A. Moore, and J. Schneider
The Third IEEE International Conference on Data Mining, IEEE Computer Society, November, 2003, pp. 573-576.

Jump to: Download | Abstract | Notes | Text Reference | BibTeX Reference


Download [Help]

Adobe portable document format (pdf) [2189 KB]

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


Abstract

Discovering underlying structure from co-occurrence data is an important task in a variety of fields, including: insurance, intelligence, criminal investigation, epidemiology, human resources, and marketing. Previously Kubica et. al. presented the group detection algorithm (GDA) - an algorithm for finding underlying groupings of entities from co-occurrence data. This algorithm is based on a probabilistic generative model and produces coherent groups that are consistent with prior knowledge. Unfortunately, the optimization used in GDA is slow, potentially making it infeasible for many large data sets. To this end, we present k-groups - an algorithm that uses an approach similar to that of k-means to significantly accelerate the discovery of groups while retaining GDA's probabilistic model. We compare the performance of GDA and k-groups on a variety of data, showing that k-groups' sacrifice in solution quality is significantly offset by its increase in speed.


Notes

Number of pages: 4


Text Reference

J.M. Kubica, A. Moore, and J. Schneider, "Tractable Group Detection on Large Link Data Sets," The Third IEEE International Conference on Data Mining, IEEE Computer Society, November, 2003, pp. 573-576.


BibTeX Reference

@inproceedings{Kubica_2003_4548,
   author = "Jeremy Martin Kubica and Andrew Moore and Jeff Schneider",
   title = "Tractable Group Detection on Large Link Data Sets",
   booktitle = "The Third IEEE International Conference on Data Mining",
   month = "November",
   year = "2003",
   pages = "573-576",
   publisher = "IEEE Computer Society"
}


The Robotics Institute is part of the School of Computer Science, Carnegie Mellon University.
For updates and comments, please see these instructions.
This page maintained by robotwebmaster@ri.cmu.edu