Unsupervised Approach for Document Clustering Using Modified Fuzzy C mean Algorithm

  IJCOT-book-cover
 
International Journal of Computer & Organization Trends (IJCOT)          
 
© 2011 by IJCOT Journal
Volume-1 Issue-3                          
Year of Publication : 2011
Authors : K.Sathiyakumari , V.Preamsudha , G.Manimekalai

Citation

K.Sathiyakumari , V.Preamsudha , G.Manimekalai."Unsupervised Approach for Document Clustering Using Modified Fuzzy C mean Algorithm" International Journal of Computer & organization (IJCOT), V1(3):11-15 Nov - Dec 2011, ISSN 2249-2593, www.ijcotjournal.org. Published by Seventh Sense Research Group.

Abstract

Clustering is one the main area in data mining literature. There are various algorithms for clustering. There are several clustering approaches available in the literature to clu ster the document. But most of the existing clustering techniques suffer from a wide range of limitations. The existing clustering approaches face the issues like practical applicability, very less accuracy, more classification time etc. In recent times, inclusion of fuzzy logic in clustering results in better clustering results. One of the widely used fuzzy logic based clustering is Fuzzy C - Means (FCM) Clustering. In order to further improve the performance of clustering, this thesis uses Modified Fuzzy C - Means (MFCM) Clustering. Before clustering, the documents are ranked using Term Frequency – Inverse Document Frequency (TF – IDF) technique. From the experimental results, it can be observed that the proposed technique results in better clustering results whe n compared to the existing technique .

References

[1] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000.
[2] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algo rithms , Kluwer, Norwell, MA, 1981.
[3] W. Pedrycz, Conditional fuzzy C - means, Pattern Recognition Letters 17 (1996) 625 – 632.
[4] J.Li, X.B.Gao, L.C.Jiao, A novel typical - sample - weighting clustering algorithm for large datasets , LANI, vol. 3801,2005.
[5] X. Wang. A Course in Fuzzy Systems and Control. Prentice Hall, Inc, Upper Saddle River, NJ, 1997.
[6] M. F. Porter,"An algorithm for suffix stripping", Program; automated library and information systems, 14(3), 130 - 137, 1980.
[7] Aggarwal, C. C., Wolf, J. L., Yu, P. S., Procopiuc, C., & Park, J. S. (1999). Fast algorithms for projected clustering. ACM SIGMOD Conference (pp. 61 – 72).
[8] Zhao, Y., & Karypis, G. (2001). Criterion functions for document clustering: Experiments and analysis (Technical Re port). Department of Computer Science, University of Minnesota.
[9] R. Baeza - Yates and B. Ribeiro - Neto (1999). Modern Information Retrieval. New York: Addison Wesley, ACM Press, 1999.
[10] Nikravesh, L. A. Zadeh, B. Azvin and R. Yager (editors). Enhancing the Powe r of the Internet - Studies in Fuzziness and Soft Computing, Springer, vol. 139, pp. 255 - 278, January 2004
[11] Pallav Roxy, and Durga Toshniwal, “Clustering Unstructured Text Documents Using Fading Function”, International Journal of Information and Mathematic al Sciences, Vol 5, NO. 3 2009.
[12] Shady Shehata, Fakhri Karray and Mohamed S. Kamel, "An Efficient Model For Enhancing Text Categorization Using Sentence Semantics", International Journal of Computational Intelligence, 2010.
[13] Jun Zhai, Yan Chen, Qinglian Wang and Miao Lv “Fuzzy ontology models using intuitionistic fuzzy set for knowledge sharing on the semantic web”, 12th International Conference on Computer Supported Cooperative Work in Design, 2008.
[14] A. Hinneburg and D.A. Keim. Optimal gridclustering: Towards breaking the curse of dimensionality in high - dimensional clustering. In Proc. of VLDB - 1999, Edinburgh, Scotland, September 2000. Morgan Kaufmann, 1999.
[15] H. Schuetze and C. Silverstein. Projections for efficient document clustering. In Proc. of SIGIR - 1997, Philadelphia, PA, July 1997, pages 74 – 81. Morgan Kaufmann, 1997.
[16] Liping Jing,” Survey of Text Clustering”, Department of Mathematics, The University of Hong Kong, HongKong, China, , ISBN: 7695 - 1754 - 4/02
[17] G. Stumme, R. Taouil, Y. Bastide, N. Pasquier and L. Lakhan, “Computing iceberg concept lattice with Titanic”, Journal on Knowledge and Data Engineering, Vol. 42, No. 2, 2002, pp. 189 - 222.
[18] S. Pollandt, Fuzzy - Begriffe: Formale Begriffsanalyse unscharfer Daten, Springer Verlag, Berlin - Heidelberg, 1996 .

Keywords

Data mining, M FCM algorithm, Purity, Entropy, TF - IDF.