International Journal of Education and Management Engineering(IJEME)

ISSN: 2305-3623 (Print), ISSN: 2305-8463 (Online)

Published By: MECS Press

IJEME Vol.6, No.5, Sep. 2016

An Intelligent Survey of Personalized Information Retrieval using Web Scraper

Full Text (PDF, 254KB), PP.24-31

Views:72   Downloads:2


Bhaskar Ghosh Dastidar, Devanjan Banerjee, Subhabrata Sengupta

Index Terms

Web-Scraping;Information Retrieval;IR, Personal Information Retrieval;Semantic;Web;WWW; Internet;Bots;Spider;Crawler;Jaunt


In this paper we aim to do an intelligent background survey of Personalized Information Retrieval, a specialized and crucial subsection of Information Retrieval or IR. We have chosen the method of IR as Web Scraping, a technique that is extremely popular and is proven to have multi-domain usage.

Cite This Paper

Bhaskar Ghosh Dastidar, Devanjan Banerjee, Subhabrata Sengupta,"An Intelligent Survey of Personalized Information Retrieval using Web Scraper", International Journal of Education and Management Engineering(IJEME), Vol.6, No.5, pp.24-31, 2016.DOI: 10.5815/ijeme.2016.05.03


[1]Rahul Dhawan, Murda Shukla, Priyanka Puvar, Bhagirath Prajapati, "A Novel Approach to Web Scraping Technology", International Journal of Advanced Research in Computer Science and Software Engineering.

[2]David Martinez, Richard Baron Penman, Timothy Baldwin, "Web Scraping Made Simple with Site Scraper".

[3]Jose´ Ignacio Fernandez-Villamor, Jacobo Blasco-Garc´ıa, Carlos A'. Iglesias, Mercedes Garijo, "A Semantic scraping model for web resources-Applying Linked Data to Web Page Screen Scraping".


[5]Amit Sheth, Clemens Bertram, David Avant, Brian Hammond, Krysztof Kochut, and Yashodhan Wake, Coquette. "Managing Semantic content for the web", IEEE INTERNET COMPUTING,1089-7801/02/$17.00 ©2002 IEEE.

[6]Malik, S.K., Rizvi, "Information Extraction Using Web Usage Mining" IEEE, Xplore Digital Library.

[7]Chang, C., Kayed, M., Girgis, M., and Shaalan, K. (2006), "A survey of web information extraction systems", IEEE Transactions on Knowledge and Data Engineering. 

[8]Hogue, A. (2005), "Thresher: Automating the unwrapping of semantic content from the world wide web", In Proceedings of the Fourteenth International World Wide Web Conference, pages 86–95. ACM Press.

[9]R. Cooley, B. Mobasher, and J. Srivastava, "Web Mining: Information and Pattern Discovery on the World Wide Web", IEEE 1997.

[10]Brijendra Singh, Hemant Kumar Singh," WEB DATA MINING RESEARCH: A SURVEY", IEEE2010.

[11]Kai Zhong Zhang And Dennis Shasha," Simple Fast Algorithms For Editing Distance Between The Trees And Related Problems".

[12]Kuo - Chung Tai, "The Tree-To-Tree Correction Problem". ACM 1979.

[13]Real World Application of Web Scraping

[14]Fuhr, N. and Grojohann, K. 'XIRQL: An extension of XQL for information retrieval.' In Proceedings of the ACM SIGIR 2000 Workshop on XML and Information Retrieval.

[15]Golbeck, J., Parsia, B., and Hendler, J. 'Trust networks on the Semantic Web.' To appear in the Proceedings of Cooperative Intelligent Agents 2003, August 27-29, Helsinki, Finland.

[16]Mayfield, J., McNamee, P. and Piatko, C. 'The JHU/APL HAIRCUT system at TREC-8.' The Eighth Text Retrieval Conference (TREC-8), pages 445-452, November 1999.

[17]Shah, U., Finin, T., Joshi, A., Cost, R. S. and Mayfield, J. 'Information Retrieval on the Semantic Web.' 10th International Conference on Information and Knowledge Management, November 2002.

[18]Kopena, J., and Regli, W., 'DAMLJessKB: A tool for reasoning with the Semantic Web.' IEEE Intelligent Systems 18(3), May/June 2003. 

[19]Bar-Yossef, Z., Kanza, Y., Kogan, Y., Nutt, W. and Sagiv, Y.. 'Quest: Querying semantically tagged documents on the World Wide Web.' In Proc. of the 4th Workshop on Next Generation Information Technologies and Systems, volume NGITS'99, Zikhron-Yaakov (Israel), July 1999.

[20]Abiteboul, S., Quass, D., McHugh, J. Widom, J. and Wiener, J. 'The Lorel query language for semistructured data.'International Journal on Digital Libraries 1, pages 68-88, April 1997.

[21]Arocena, G. and Mendelzon, A. 'WebOQL: Restructuring documents, databases and webs.' In International Conference on Data Engineering, pages 24-33. IEEE Computer Society, 1998.

[22]Berners-Lee, T., and Fischetti, M. Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by its Inventor. Harper, San Francisco. 1999.