RI Homepage Carnegie Mellon Homepage RI Homepage

The Robotics Institute

Carnegie Mellon Robotics Institute

Building Domain-Specific Search Engines with Machine Learning Techniques

Andrew McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore
AAAI Spring Symposium on Intelligent Agents in Cyberspace 1999, 1999.


Download
  • Adobe portable document format (pdf) (334KB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract
Domain-specific search engines are growing in popularity because they offer increased accuracy and extra functionality not possible with the general, Web-wide search engines. For example, www.campsearch.com allows complex queries by age-group, size, location and cost over summer camps. Unfortunately these domain-specific search engines are difficult and time-consuming to maintain. This paper proposes the use of machine learning techniques to greatly automate the creation and maintenance of domain-specific search engines. We describe new research in reinforcement learning, information extraction and text classification that enables efficient spidering, identifying informative text segments, and populating topic hierarchies. Using these techniques, we have built a demonstration system: a search engine for computer science research papers. It already contains over 50,000 papers and is publicly available at www.cora.justresearch.com.

Notes

Text Reference
Andrew McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore, "Building Domain-Specific Search Engines with Machine Learning Techniques," AAAI Spring Symposium on Intelligent Agents in Cyberspace 1999, 1999.

BibTeX Reference
@inproceedings{Seymore_1999_2716,
   author = "Andrew McCallum and Kamal Nigam and Jason Rennie and Kristie Seymore",
   title = "Building Domain-Specific Search Engines with Machine Learning Techniques",
   booktitle = "AAAI Spring Symposium on Intelligent Agents in Cyberspace 1999",
   year = "1999",
}