Comparison and Evaluation of scaled data mining algorithms

  IJCOT-book-cover
 
International Journal of Computer & Organization Trends (IJCOT)          
 
© 2011 by IJCOT Journal
Volume-1 Issue-3                          
Year of Publication : 2011
Authors : M Afshar Alam , Sapna Jain ,Ranjit Biswas

Citation

M Afshar Alam , Sapna Jain ,Ranjit Biswas. "Comparison and Evaluation of scaled data mining algorithms". International Journal of Computer & organization Trends (IJCOT), V1(3):28-34 Nov - Dec 2011, ISSN 2249-2593, www.ijcotjournal.org. Published by Seventh Sense Research Group.

Abstract

Association rule mining is the most popular technique in data mining. Mining association rules is a prototypical problem as the data are being generated and stored every day in corporate computer database systems. To manage this knowledge, rules have to be pruned and grouped, so that only reasonable numbers of rules have to be inspected and analyzed. In this paper we compare the standard association rule algorithms with the proposed Scaled Association Rules algorithm and AIREP algorithm. All these algorithms are compared according to the various factors like Type of dataset, support counting, rule generation, candidate generation, computational complexity and other factors .The conclusions drawn are based on the efficiency ,performance , accuracy and scalability parameters of the algorithms.

References

[1] J. P. Bigus., “Data Mining with Neural Networks”, McGraw - Hill, 1996
[2] T. M. Mitchell., “Machine Learning”, McGraw - Hill, 1997.
[3] Sousa, M.S. Mattoso, M.L.Q. Ebecken, N.F.F. "Data Mining on Parallel Da tabase Systems" Proc. Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA`98), Special Session on Parallel Data Warehousing, CSREA Press, Las Vegas, E.U.A., Pp.1147 - 1154, July 1998.
[4] Fayyad U, “Data Mining and Knowledge Discovery in Databases: Implications from scientific databases,” In Proc. of the 9th Int. Conf. on Scientific and Statistical Database Management, Olympia, Washington, USA, pp. 2 - 11, 1997.
[5] Tsau Young Lin, "Sampling in association rule mining", Conferen ce on Data mining and knowledge discovery: Theory, Tools, and Technology VI, vol. 5433, pp.: 161 - 167, 2004.
[6] Klaus Julisch," Data Mining for Intrusion Detection - A Critical Review" in proc. of IBM Research on application of Data Mining in Computer secur ity, Chapter 1 , 2002.
[7] Jeffrey W. Seifert, "Data Mining: An Overview", in proceedings of CRS Report for Congress, 2004.
[8] Coenen F, Leng P, Goulbourne, G., “Tree Structures for Mining Association Rules,” In Journal of Data Mining and Knowledge Discov ery, Vol. 15, pp. 391 - 398, 2004.
[9] Marek Wojciechowski, Krzysztof Galecki, Krzysztof Gawronek: ‘Concurrent Processing of Frequent Itemset Queries Using FP - Growth Algorithm’, Proc. Of the 1st ADBIS Workshop on Data Mining and Knowledge Discovery (ADMKD`05 ), Tallinn, Estonia, 2005. V.Umarani et. al. / IJCSR International Journal of Computer Science and Research, Vol. 1 Issue 1, 2010
10] Yu - Chiang Li, Jieh - Shan Yeh, Chin - Chen Chang, “Efficient Algorithms for Mi ning Shared - Frequent Itemsets”, In Proceedings of the 11th World Congress of Intl. Fuzzy Systems Association, 2005.
[11] F. Bodon, “A Fast Apriori Implementation”, In B. Goethals and M. J. Zaki, editors, Proceedings of the IEEE ICDM Workshop on Frequent It emset Mining
Implementations, Vol. 90 of CEUR Workshop Proceedings, 2003. [12] Basel A. Mahafzah, Amer F. Al - Badarneh and Mohammed Z. Zakaria "A new sampling technique for association rule mining," in Journal Of Information Science, Vol.35, pp. 358 - 376, 20 09.
[13] Venkatesan T. Chakaravarthy, Vinayaka Pandit and Yogish Sabharwal, "Analysis of sampling techniques for association rule mining," In Proceedings of the 12th International Conference on Database Theory, Vol. 361, pp. 276 - 283, 2009.
[14] Y. Zhao, C. Zhang and S. Zhang, “Efficient frequent itemsets mining by sampling,” Proceedings of the fourth International Conference on Active Media Technology (AMT), pp. 112 - 117, 2006.
[15] Han, j. and Pei, J. 2000. Mining frequent patterns by pattern - growth: method ology and implications. ACM SIGKDD Explorations Newsletter2, 2, 14 - 20.
[16] Wang, C., Tjortjis, C., Prices: An Efficient Algorithm for Mining Association Rules, Lecture Notes in Computer Science, Volume 2447, 2002. pp. 77 - 83.
[17] Yuan, Y., Huang, T., A Ma trix Algorithm for Mining Association Rules, Lecture Notes in Computer Science, Volume 3664, Sep2005.pp 370 - 379.
[18] R.Agrawal, T.Imielinski, and A.Swami, “Mining association rules between sets of Items in large databases”, in proceedings of the ACM SIGMO D Int`l Conf. on Management of data, pp. 207 - 216, 1993.
[19] Choh Man Teng, "A Comparison of Standard and Interval Association Rules", In Proceedings of the Sixteenth International FLAIRS Conference, pp.: 371 - 375, 2003.
[20] Suzuki Kaoru, “Data Mining and the Case for Sampling,” SAS Institute Best Practices Paper, SAS Institute, 1998.
[21] Soo, J., Chen, M.S., and Yu, P.S., 1997, “Using a Hash - Based Method with Transaction Trimming and Database Scan Reduction for Mining Association Rules” IEEE Transactions On Knowledge and Data Engineering, Vol.No.5. pp. 813 - 825.
[22] En Tzu Wang and Arbee L.P. ChenData,“ A Novel Hash - based approach for mining frequent itemsets over data streams requiring less Memory space” Data Mining and Knowledge Discovery, Volume 19, Nu mber 1, pp 132 - 172.
[23] Wojciechowski, M., Zakrzewiez, M., Dataset filtering Techniques in Constraint based Frequent pattern Mining, Lecture Notes in Computer Science, Volume 2447, 2002, pp77 - 83.
[24] Tien Dung Do, Siu Cheng Hui,Alvis Fong, Mining frequen t itemsets with category Based Constraints. Lecture Notes in Computer Science, Volume 2843, 2003, pp226 - 234.
[25] Das, A., Ng, W.K., and Woon, Y, K. 2001. Rapid association rule mining. In the proceedings of the tenth international conference on Informatio n and knowledge management.. ACM press, 474 - 481.
[26] Rakesh Agarwal, Ramakrishnan Srikant,” Fast Algorithms for Mining Association Rules” 20th Intl Conference on VLDB, Santigo, Chile, Set.1994.
[27] Thevar., R.E; Krishnamoorthy, R,” A new approach of modi fied transaction reduction algorithm for mining frequent itemset”, ICCIT 2008.11th conference on Computer and Information Technology.
[28] Cheung, D., Han, J.Ng, V., Fu, A and Fu, Y. (1996), “A fast distributed algorithm for mining association rules” in Pr oc of 1996 Int’l Conference on Parallel and Distributed Information Systems’. Miami Beach, Florida, pp.31 - 44.
[29] Parthasarathy, S., "Efficient progressive sampling for association rules", IEEE International Conference on Data Mining, pp.: 354 - 361, 2002.
[30] V.Umarani and M.Punithavalli,” Developing a Novel and Effective Approach for Association Rule Mining Using Progressive Sampling” In the proc of 2nd Int’l Conference on Computer and Electrical Engineering (ICCEE 2009), vol.1, pp610 - 614.
[31] V.Umarani and M.Punithavalli,” On Developing an Effectual Progressive Sampling Based Approach for Association Rule Discovery” In the proc of 2nd IEEE Int’l Conference on Information and data Engineering (2nd IEEE ICIME 2010), Chengdu ,China April 2010 .
[32] Cheung, D., Xaio, Y., Effect of data skewness in parallel mining of association rules, Lecture Notes in Computer Science, Volume 1394,Aug 1998,pages 48 - 60.
[33] Raymond Chi - Wing Wong, Ada Wai - Chee Fu, "Association Rule Mining and its Application to MPIS", 2003.
[34] Agrawal, R. and Srikant, R., Fast algorithms for mining association rules. In Proc.20th Int. Conf. Very Large Data Bases, 487 - 499, 1994.
[35] Sotiris Kotsiantis, Dimitris Kanellopoulos,” Association Rules Mining: A Recent Overview", GEST S International Transactions on Computer Science and Engineering, Vol.32, No: 1, pp. 71 - 82, 2006.
[36] Parthasarathy, S., Zaki, M.J.J., Ogihara, M., Parallel data mining for association rules on shared - memory systems, Knowledge and Information Systems: An International Journal,3(1):1 - 29,February 2001.
[37] Basel A. Mahafzah, Amer F. Al - Badarneh and Mohammed Z. Zakaria "A new sampling technique for association rule mining," in Journal of Information Science, Vol. 35, pp. 358 - 376, 2009.
[38] B.Lent, A.Swami,J .Wisdom, “Clustering association rules”, In the proc of 13th Int’l Conference on Data Engineering,pp.220. [39] John D. Holt and Soon M. Chung,” Mining of Association Rules in Text Databases Using Inverted Hashing and Pruning” Lecture Notes in Computer Scie nce, 2000, Volume 1874/2000, 290 - 300.
[40] Rajendra K.Gupta and Dev Prakash Agarwal,”Improving the performance of Association Rule Mining Algorithms by Filtering Insignificant Transactions dynamically”, Asian Journal of Information Management, pp.7 - 17. 200 9 Academic Journals Inc.
[41] Pi Dechang and Qin Xiaolin,” A New Fuzzy Clustering Algorithm on Association Rules for Knowledge Management”, Information Technology Journal. Pp. 119 - 124, 2008. Asian Network for Scientific Information. [42] Margaret H.Dunham, ”Data mining Introductory and Advanced Topics”, Pearson Education 2008.
[43]Tamanna Siddqui,M Afshar Alam ,Sapna jain ,” Discovery of Scalable Association Rule from large set of multidimensional quantitative datasets.”,Academy publisher Journal [44]Sapna j ain,M Afshar Alam ,Ranjit Biswas ,” A I R E P : a novel scaled multidimensional quantitative rules generation approach.

Keywords

Association rule, Data Mining, Multidimensional dataset, Pruning, Frequent itemset. Introduction