Feature Selection for Extracting Semantically Rich Words

Young-Woo Seo, Anupriya Ankolekar, and Katia Sycara

Tech. Report, CMU-RI-TR-04-18, Robotics Institute, Carnegie Mellon University, March, 2004

View Publication

Abstract

The utility of semantic knowledge, in the form of ontologies, is widely acknowledged. In particular, semantic knowledge facilitates integration, visualization, and maintenance of information from various sources. However, the majority of previous work in this field has tried to learn ontologies for relatively constrained domains. In other words, to date, there has been relatively little work on trying to construct ontologies for an open domain, where there are enormous needs for such ontologies. Moreover, there have been few studies that empirically examine the value of text learning techniques to extract a set of candidate words for concept words in a domain ontology. The goal of this work is to examine the usefulness of existing feature selection methods for the extraction of a set of good candidate words for concept words in an ontology. From the experimental results, we found that the existing word feature selection methods are quite useful for ontology learning, in that there is a good overlap between the word sets identified by feature selection methods and the words in a manually built domain ontology. Finally, from our experience of working on this paper, we enumerate the desiderata for a domain ontology learning system.

BibTeX

@techreport{Seo-2004-8868,
author = {Young-Woo Seo and Anupriya Ankolekar and Katia Sycara},
title = {Feature Selection for Extracting Semantically Rich Words},
year = {2004},
month = {March},
institute = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-04-18},
keywords = {ontology learning, text learning, feature selection, machine learning},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.