A Brief Survey On Document Clustering Techniques Using MATLAB

  IJCOT-book-cover
 
International Journal of Computer & Organization Trends (IJCOT)          
 
© 2013 by IJCOT Journal
Volume-3 Issue-1                          
Year of Publication : 2013
Authors :  Rachitha Sony.Krotha, Suneetha Merugula,

Citation

Rachitha Sony.Krotha, Suneetha Merugula,  "A Brief Survey On Document Clustering Techniques Using MATLAB" . International Journal of Computer & organization Trends  (IJCOT), V3(1):1-6 Jan - Feb 2013, ISSN:2249-2593, www.ijcotjournal.org. Published by Seventh Sense Research Group.

Abstract

Document clustering is a more specific technique for unsupervised document organization, it is generally considered to be a centralized process. Clustering methods can be used to automatically group the retrieved documents into a list of meaningful categories. This paper gives an overview of some of the mostly used document clustering techniques and introduces the matlab tool which provides us many functions that helps in the clustering of the documents. In particular we concentrate on the most commonly used clustering techniques Agglomerative hierarchical clustering and K-means that are commonly used for document clustering and related matlab functions available in the matlab toolbox.

References

[1] Jiawei Han and Micheline Kamber., “Data Mining Concepts and Techniques”, Elsevier Pubications.
[2] Rajan chatamvelli “Data Mining Methods”, Narosa publishing house.
[3] Manu Konchady, “Text Mining Application Programming”, Cengage Learning
[4] Stephen J.Chapman, “MATLAB Programming for Engineers”, Thomson Learning third edition.
[5] Statistics Toolbox User’s Guide. (September 2009), Available at: <http://www.mathworks.com/acc ess/helpdesk/help/pdf_doc/stats/s tats.pdf>
[6] Paul Bradley and Usama Fayyad, Refining Initial Points for K-Means Clustering, Proceedings of the Fifteenth International Conference on Machine Learning ICML98, Pages 91- 99. Morgan Kaufmann, San Francisco, 1998.
[7] Benjamin C. M. Fung, Ke Wang, and Martin Ester, Hierarchical Document clustering.
[8] Moses Charikar, Chandra Chekuri, Tomas Feder, and Rajeev Motwani, Incremental Clustering and Dynamic Information Retrieval, STOC 1997, Pages 626-635, 1997.
[9] Javed Aslam, Katya Pelekhov, and Daniela Rus, A Practical Clustering Algorithm for Static and Dynamic Information Organization, Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, Bethesda, Maryland, USA, Pages 208- 217, November 3-7, 1998.
[10] Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim, (1998), ROCK: A Robust Clustering Algorithm forCategorical Attributes, In Proceedings of the 15th International Conference on Data Engineering, 1999.
[11] Daphe Koller and Mehran Sahami, Hierarchically classifying documents using very few words, Proceedings of the 14th International Conference on Machine Learning (ML), Nashville, Tennessee, July 1997, Pages 170-178.
[12] Charu C. Aggarwal, Stephen C. Gates and Philip S. Yu, On the merits of building categorization systems by supervised clustering, Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Pages 352 – 356, 1999

Keywords

clustering, hierarchial clustering, K-means , Matlab toolbox