Diamond: A Storage Architecture for Early Discard in Interactive Search - Robotics Institute Carnegie Mellon University

Diamond: A Storage Architecture for Early Discard in Interactive Search

Larry Huston, Rahul Sukthankar, Rajiv Wickremesinghe, M. Satyanarayanan, Gregory R. Ganger, Erik Riedel, and Anastassia Ailamaki
Conference Paper, Proceedings of Usenix File and Storage Technologies (FAST '04), April, 2004

Abstract

This paper explores the concept of early discard for interactive search of unindexed data. Processing data directly inside active storage devices using downloaded searchlet code enables Diamond to perform efficient, application-specific filtering of large data collections. Early discard helps users who are looking for "needles in a haystack" by eliminating the bulk of the irrelevant items as early as possible. A searchlet consists of a set of application-generated filters that Diamond uses to determine whether an object may be of interest to the user. The system optimizes the evaluation order of the filters based on run-time measurements of each filter's selectivity and computational cost. Diamond can also dynamically partition computation between the storage devices and the host computer to adjust for changes in hardware and network conditions. We provide an analysis of the behavior of our system and present performance numbers from a Linux-based prototype showing that Diamond can dynamically adapt to a query and run-time system state. An informal user study of an image retrieval application supports our belief that early discard significantly improves the quality of interactive searches.

BibTeX

@conference{Huston-2004-8891,
author = {Larry Huston and Rahul Sukthankar and Rajiv Wickremesinghe and M. Satyanarayanan and Gregory R. Ganger and Erik Riedel and Anastassia Ailamaki},
title = {Diamond: A Storage Architecture for Early Discard in Interactive Search},
booktitle = {Proceedings of Usenix File and Storage Technologies (FAST '04)},
year = {2004},
month = {April},
keywords = {active storage, information retrieval},
}