Feature Selection for Extracting Semantically Rich Words

Young-Woo Seo, Anupriya Ankolekar, and Katia Sycara
tech. report CMU-RI-TR-04-18, Robotics Institute, Carnegie Mellon University, March, 2004


Download
  • Adobe portable document format (pdf) (75KB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract
The utility of semantic knowledge, in the form of ontologies, is widely acknowledged. In particular, semantic knowledge facilitates integration, visualization, and maintenance of information from various sources. However, the majority of previous work in this field has tried to learn ontologies for relatively constrained domains. In other words, to date, there has been relatively little work on trying to construct ontologies for an open domain, where there are enormous needs for such ontologies. Moreover, there have been few studies that empirically examine the value of text learning techniques to extract a set of candidate words for concept words in a domain ontology. The goal of this work is to examine the usefulness of existing feature selection methods for the extraction of a set of good candidate words for concept words in an ontology. From the experimental results, we found that the existing word feature selection methods are quite useful for ontology learning, in that there is a good overlap between the word sets identified by feature selection methods and the words in a manually built domain ontology. Finally, from our experience of working on this paper, we enumerate the desiderata for a domain ontology learning system.

Keywords
ontology learning, text learning, feature selection, machine learning

Notes
Sponsor: DARPA, Air Force Research Laboratory
Grant ID: F30601-00-2-0592
Associated Center(s) / Consortia: Center for Integrated Manfacturing Decision Systems
Associated Lab(s) / Group(s): Advanced Agent - Robotics Technology Lab

Text Reference
Young-Woo Seo, Anupriya Ankolekar, and Katia Sycara, "Feature Selection for Extracting Semantically Rich Words," tech. report CMU-RI-TR-04-18, Robotics Institute, Carnegie Mellon University, March, 2004

BibTeX Reference
@techreport{Seo_2004_4624,
   author = "Young-Woo Seo and Anupriya Ankolekar and Katia Sycara",
   title = "Feature Selection for Extracting Semantically Rich Words",
   booktitle = "",
   institution = "Robotics Institute",
   month = "March",
   year = "2004",
   number= "CMU-RI-TR-04-18",
   address= "Pittsburgh, PA",
}