Distributed Data Clustering
Citation
Sundas Charagh, Ayesha Saleem, Amnah Mukhtar"Distributed Data Clustering", International Journal of Computer & organization Trends (IJCOT), V5(3):36-39 May - Jun 2015, ISSN:2249-2593, www.ijcotjournal.org. Published by Seventh Sense Research Group.
Abstract In modern era the volume of data is enlarging day by day. It has become impossible to handle this data without data mining there are different techniques-clustering is one of them. Clustering is a process of grouping same type of objects. In distributed data clustering these groups are distributed over different sites and then centralized at global sit. The purpose of distributing these clusters is efficiency, performance, communication cost and storage limit. There are many different techniques and algorithms are available for distributed data clustering. These algorithms are divided into two categories-synchronous and asynchronous that further has some sub-categories such as k-means, k harmonic means, DBSCAN, PCA based and many more. The paper also describes some important merits of distributed data clustering as well as demerits.
References
[1] V. Fiolet, E. Laskowski, R. Olejnik, L. Ma, B. Toursel, and M. Tudruj, “Optimizing Distributed Data Mining Applications Based on Object Clustering Methods,” pp. 1–6, 2006.
[2] X. Lin, C. Clifton, and M. Zhu, “Privacy-preserving clustering with distributed EM mixture modeling,” Knowledge and Information Systems, vol. 8, no. 1, pp. 68–81, Dec. 2004.
[3] H. Kriegel, “Towards Effective and Efficient Distributed Clustering,” 2003.
[4] J. C. Silva, C. Giannella, R. Bhargava, H. Kargupta, and M. Klusch, “Distributed Data Mining and Agents f g f g,” no. Ddm.
[5] E. Januzaj, H. Kriegel, and M. Pfeifle, “DBDC : Density Based Distributed Clustering.”
[6] I. S. Dhillon and D. S. Modha, “A Data-Clustering Algorithm On Distributed Memory Multiprocessors.”s
[7] Forman, G., & Zhang, B. (2000). Distributed data clustering can be efficient and exact. ACM SIGKDD explorations newsletter, 2(2), 34-38.
Keywords
Data mining, Clustering, efficiency and performance.