Crawler for Image Acquisition from World Wide Web

  IJCOT-book-cover
 
International Journal of Computer & Organization Trends  (IJCOT)          
 
© 2017 by IJCOT Journal
Volume - 7 Issue - 1
Year of Publication : 2017
Authors :  R Rajkumar, Dr. M V Sudhamani

Citation

R Rajkumar, Dr. M V Sudhamani "Crawler for Image Acquisition from World Wide Web", International Journal of Computer & organization Trends (IJCOT), V7(1):28-33 Jan - Feb 2017, ISSN:2249-2593, www.ijcotjournal.org. Published by Seventh Sense Research Group.

Abstract

Due to the advancement in computer communication and storage technologies, large amount of image data is available on World Wide Web (WWW). In order to locate a particular set of images the available search engines may be used with the help of keywords. Here, the filtering of unwanted data is not done. For the purpose of retrieving relevant images with appropriate keyword(s) an image crawler is designed and implemented. Here, keyword(s) are submitted as query and with the help of sender engine, images are downloaded along with metadata like URL, filename, file size, file access date and time etc.,. Later, with the help of URL, images already present in repository and newly downloaded are compared for uniqueness. Only unique URLs are in turn considered and stored in repository. The images in the repository are used to build novel Content Based Image Retrieval (CBIR) system in future. This repository may be used for various purposes. This image crawler tool is useful in building image datasets which can be used by any CBIR system for training and testing purposes.

References

[1] Marc Najork, Web Crawler Architecture, Encyclopedia of Database Systems, Microsoft Research, Mountain View, CA, USA, 2009.
[2] Bo Luo, Xiaogang Wang, and Xiaoou Tang, A World Wide Web Based Image Search Engine Using Text and Image Content Features, Internet Imaging IV, SPIE Vol. 5018, 2003.
[3] Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang, Image Retrieval: Ideas, Influences, and Trends of the New Age, ACM Computing Surveys, pp. 5:1 – 5:60., Vol. 40, No. 2, April 2008.
[4] Abhinna Agarwal, Durgesh Singh, Anubhav Kedia, Akash Pandey, Vikas Goel, Design of a Parallel Migrating Web Crawler, IJARCSSE, Volume 2, Issue 4, ISSN: 2277 128X, April 2012.
[5] Junghoo Cho, Hector Garcia-Molina, Effective Page Refresh Policies For Web Crawlers, ACM Transactions on Database Systems, Vol. 28, No. 4, December 2003.
[6] Yingjun Wu, Han Huang, Xianzheng Zhou, Xiaobo Zhang, Feng Chen, A Space-saving URL Duplication Removal Method for Web Crawler, Journal of Information & Computational Science, http://www.joics.com, May 2012
[7] Anthoniraj Amalanathan and Senthilnathan Muthukumaravel, Semantic Web Crawler Based on Lexical Database, IOSR Journal of Engineering, ISSN: 2250-3021, Vol. 2(4) pp: 819-823, April 2012.
[8] Peng Yang, Hui Li, Qingshan Liu, Lin Zhong, Dimitris Metaxas, Content Quality Based Image Retrieval With Multiple Instance Boost Ranking, MM’11, November 28– December 1, 2011, Scottsdale, Arizona, USA, ACM 978-1- 4503-0616-4/11/11, 2011.
[9] Chetna, Harpal Tanwar, Navdeep Bohra, An Approach to Reduce Web Crawler Traffic Using ASP.NET, International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-2, Issue-3, July 2012.
[10] Swati Ringe, Nevin Francis, Palanawala Altaf, Ontology Based Web Crawler, International Journal of Computer Applications in Engineering Sciences, ISSN: 2231-4946, VOL II, ISSUE III, Sept 2012.
[11] Mini Singh Ahuja, Dr. Jatinder Singh Bal, Varnica, Web Crawler: Extracting the Web Data, International Journal of Computer Trends and Technology (IJCTT), ISSN: 2231- 2803, Vol: 13 No: 3, Jul 2014.
[12] Niraj Singhal, Ashutosh Dixit, R. P. Agarwal, A. K. Sharma, Regulating Frequency of a Migrating Web Crawler based on Users Interest, International Journal of Engineering and Technology (IJET), ISSN : 0975-4024 Vol: 4 No: 4, Aug-Sep 2012.
[13] Ram Kumar Rana, Nidhi Tyagi, A Novel Architecture of Ontology-based Semantic Web Crawler, International Journal of Computer Applications (0975 – 8887), Volume 44– No.18, April 2012.
[14] Purohit Shrinivasacharya, Dr. M V Sudhamani, An Image Crawler For Content Based Image Retrieval System, International Journal of Research in Engineering and Technology (IJRET), eISSN: 2319-1163 pISSN: 2321- 7308, Volume: 02, Issue: 11, Nov - 2013.
[15] Sun. Y, Council G. Isaac and Giles C. Lee, The Ethicality of Web Crawlers, in the proceedings of 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology,.(pp: 668- 675), Toronto Canada, August 2010.
[16] Ms. Sayali.S.Pawar, Prof. R.S.Chaure, A New Trend Content-Based Image Retrieval Technique used in Real Time Application, International Journal of Advanced Research in Computer Science and Software Engineering, ISSN: 2277 128X, Volume 4, Issue 6, June 2014.

Keywords
CBIR, Image Crawler, Metadata, World Wide Web (WWW) and Uniform Resource Locator (URL).