Search

Navigator: RI | People | Dan Pelleg

Graphics enhanced version of this site

Dan Pelleg
Postdoctoral Fellow

No longer a member of RI.

For more information, see my personal homepage.

Jump to: Research interests | Keywords | Publications


Research interests

Whenever we approach the so-called “data mining” problem, we realize it means different things to different people. Scientists and analysts - the consumers of algorithms and of data products - relate to the various tasks: pattern recognition, structural organization, regression, anomaly finding, and so on. On top of that, we as computer scientists - producers of algorithms and tools - break it down to its building blocks: statistics, computational complexity, and knowledge management.

On first glance, it would seem this disparity has the potential for many false expectations and impossible requirements. But the truth is that this very tension is what advances research in the field. Here is how it typically happens. A scientist has had access to some source of data, say experiments performed in his lab. Over time he had accumulated a set of tools and techniques to analyze it. But recently, the amount of data has become much larger. Possibly, new internet-based collaboration points give him easy access to the results of other researchers' work. Or perhaps new machinery and methods are producing data orders of magnitude better - and faster - than before. The Sloan Digital Sky Survey is a prime example of this. The goal is to map, in detail, one-quarter of the entire sky. The estimated size of the catalog, due to be completed in 2007, is 200 million objects, including images and spectroscopic data. The database will then encompass 5 terabytes of catalog data, and 25 terabytes of data overall.

The unforeseen outcome of such endeavors is that suddenly, the old tools become useless. It might be because their theoretic complexity is poor and they blow up on large inputs. Or because study of a single experiment is no longer interesting, when one can potentially draw conclusions based on thousands of similar observations. Or because the rate at which new results come exceeds the ability of an expert to internalize it all, as the old summarization and visualization methods are inadequate.

I seek to scale algorithms so that they are fit to use in this new world. I work to accelerate algorithms and data structures for fast statistical computation. Sometimes, I do this for well-known methods in ways that preserve functionality, or approximate it. In other cases, I look at solutions that restate the problem in a way that makes data analysis more manageable for people.


Research interest keywords

machine learning and statistics


Publications

Note: This list may not be comprehensive. It contains only those publications in the RI publications database. Entries are listed in reverse chronological order.


The Robotics Institute is part of the School of Computer Science, Carnegie Mellon University.
For updates and comments, please see these instructions.
This page maintained by robotwebmaster@ri.cmu.edu