TaxaMiner: An Experimental Framework for Automated Taxonomy Bootstrapping
Hierarchical taxonomies and thesauri are frequently used by content management systems for indexing, search and categorization. They are also being viewed as rudimentary ontologies for the emerging Semantic Web infrastructure. However, to date, development of taxonomies and thesauri are human intensive processes, requiring huge resources in terms of cost and time. It is critical that approaches to reduce human effort and resource commitments be investigated. Towards this end, we present an experimentation framework for automated taxonomy construction from a large corpus of documents. Our approach involves:
- (a) generation of a document cluster hierarchy;
- (b) extraction of a taxonomy from this hierarchy; and
- (c) assignment of labels to nodes in this taxonomy.
Speaker Bio Vipul Kashyap is a Senior Medical Informatician in the Clinical Informatics Research & Development group at Partners HealthCare System. Currently, he is the chief architect of a Clinical Knowledge Management portal being rolled out at Partners to enable search, browsing and retrieval of clinical content and assets of the Partners HealthCare System. Vipul has received his PhD from the Department of Computer Science at Rutgers University in New Brunswick and has performed research on semantics and knowledge-based approaches for information and knowledge management. He was a co-project manager of a Knowledge Management effort at Telcordia Technologies (formerly known as Bellcore) focused on knowledge sharing and reuse across Telcordia's Professional Services Units. He was also a fellow at the National Library of Medicine and held a position at Micro-electronics and Computer Technology Corporation (MCC). Vipul has published 2 books on the topic of Semantics in Information Brokering and serves on the editorial boards of the International Journal of Knowledge and Learning and of the International Journal of Metadata, Semantics, and Ontologies.