Polynomial Kernel Function based Support Vectors for Data Stream Clustering

International Journal of Computer & Organization Trends  (IJCOT)          
© 2014 by IJCOT Journal
Volume - 4 Issue - 6
Year of Publication : 2014
Authors : Bethapudi Naga Raju, Amarnath Gadham


Bethapudi Naga Raju, Amarnath Gadham "Polynomial Kernel Function based Support Vectors for Data Stream Clustering", International Journal of Computer & organization Trends (IJCOT), V4(6):12-18 Nov - Dec 2014, ISSN:2249-2593, www.ijcotjournal.org. Published by Seventh Sense Research Group.

Abstract—Support vector clustering (SVC) is an important clustering algorithm based on support vector machine (SVM) and kernel methods. SVC algorithm performed better than the other traditional clustering methods, such as a global optimum, treatment of data sets of arbitrary shape, no need for specifying the number of clusters, fewer parameters, and easy treatment of high dimensional data. SV clustering consists of two phases, training based support vector machine and labeling clusters. Training phase allowing for bounded support vectors (BSVs), the existing SVStream algorithm is capable of identifying overlapping clusters. A BSV decaying mechanism is designed to automatically detect and remove outliers (noise). But outlier data doesn’t optimized using linear kernel function. Proposed system will use polynomial kernel function to efficiently estimate support vectors and eliminates irrelevant data points. We represent an alternative technique for clustering stream data by using the SVM(Support Vector Machine) method. Streaming Data objects are mapped to a high dimensional characteristic space, in which support-vectors are describe a arbitrary shape training them. The outer region of the data points forms in n-dimensional space a definitive set of closed contours including the data. Streaming kdd data objects are surrounded by each contour are defined as a cluster. Experimental results show proposed method outperformed well against outliers and noise handling over existing methods. This system achieves high accuracy detection rate and less error rate of KDD CUP 1999 training data set.


[1] SV Stream: A Support Vector-Based Algorithm for Clustering Data Streams, Chang-Dong Wang, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 6, JUNE 2013. Data Bases (VLDB), 2003.
[2] H. Zhang, A. C. Berg, M. Maire, and J. Malik. Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In CVPR (2), pages 2126–2136, 2006.
[3] H. Kargupta and B.-H. Park, A Fourier Spectrum-Based Approach to Represent Decision Trees for Mining Data Streams in Mobile Environments, IEEE Trans. Knowledge Data Eng., vol. 16, no. 2, pp. 216-229, Feb. 2004.
[4] P. Zhang, X. Zhu, and Y. Shi, Categorizing and Mining Concept Drifting Data Streams, Proc. 14th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, 2008.
[5] P. Wang, H. Wang, X. Wu, W. Wang, and B. Shi, “A Low- Granularity Classifier for Data Streams with Concept Drifts and Biased Class Distribution,” IEEE Trans. Knowledge Data Eng., vol. 19, no. 9, pp. 1202-1213, Sept. 2007.
[6] F. Cao, M. Ester, W. Qian, and A. Zhou, “Density-Based Clustering over an Evolving Data Stream with Noise,” Proc. Sixth SIAM Int’l Conf. Data Mining, 2006.
[7] C.C. Aggarwal, J. Han, J. Wang, and P.S. Yu, “A Framework for on-Demand Classification of Evolving Data Streams,” IEEE Trans. Knowledge Data Eng., vol. 18, no. 5, pp. 577-589, May 2006.
[8] S. Hashemi, Y. Yang, Z. Mirzamomen, and M. Kangavari, “Adapted One-versus-All Decision Trees for Data Stream Classification,” IEEE Trans. Knowledge Data Eng., vol. 21, no. 5, pp. 624- 637, May 2009.
[9] N. Segata and E. Blanzieri. Fast and scalable local kernel machines. Journal of Machine Learning Research, 11:1883–1926, 2010