Pre-Processing of University Webserver Log Files for Intrusion Detection

Full Text (PDF, 1113KB), PP.20-30

Views: 0 Downloads: 0

Author(s)

Bukola A. Onyekwelu 1,* B. K. Alese 2 A. O. Adetunmbi 2

1. Department of Computer Science, Joseph Ayo Babalola University, Ikeji-Arakeji, Osun State, Nigeria

2. Department of Computer Science, Federal University of Technology, Akure, Ondo State, Nigeria

* Corresponding author.

DOI: https://doi.org/10.5815/ijcnis.2017.01.03

Received: 22 Jun. 2016 / Revised: 23 Sep. 2016 / Accepted: 5 Nov. 2016 / Published: 8 Jan. 2017

Index Terms

Web Server Log, Data Preprocessing, Data Cleaning, User Identification, Discretization

Abstract

Web Server log files can reveal lots of interesting patterns when analyzed. The results obtained can be used in various applications, one of which is detecting intrusions on the web. For good quality of data and usable results, there is the need for data preprocessing. In this research, different stages of data preprocessing were carried out on web server log files obtained over a period of five months. The stages are Data Conversion, Session Identification, Data Cleaning and Data Discretization. Data Discretization was carried out in two phases to take care of data with continuous attributes. Some comparisons were carried out on the discretized data. The paper shows that with each preprocessing step, the data becomes clearer and more usable. At the final stage, the data presented offers a wide range of opportunities for further research. Therefore, preprocessing web server log files provides a standard processing platform for adequate research using web server logs. This method is also useful in monitoring and studying web usage pattern in a particular domain. Though the research covers webserver log obtained from a University domain, and thus, reveals the pattern of web access within a university environment, it can also be applied in e-commerce and any other terrain.

Cite This Paper

Bukola A. Onyekwelu, B. K. Alese, A. O. Adetunmbi, "Pre-Processing of University Webserver Log Files for Intrusion Detection", International Journal of Computer Network and Information Security(IJCNIS), Vol.9, No.1, pp.20-30, 2017. DOI:10.5815/ijcnis.2017.01.03

Reference

[1]Kantardzic, M. (2011). Data Mining: Concepts, Models, Methods, and Algorithms, Second Edition. Institute of Electrical and Electronics Engineers. John Wiley & Sons, Inc.
[2]Adetunmbi, A. O. (2008): Intrusion Detection Based On Machine Learning Techniques, A Ph.D. Theses in the Department of Computer Science, Federal University of Technology, Akure, Nigeria.
[3]Dhawan, S., and Lathwal, M. (2013). Study of Preprocessing Methods in Web Server Logs. International Journal of Advanced Research in Computer Science and Software Engineering. Volume 3, Issue 5. ISSN: 2277 128X.
[4]Salama, S. E., Marie, M. I., El-Fangary, L. M. and Helmy, Y. K. (2011). Web Server Logs Preprocessing for Web Intrusion Detection. Computer and Information Science, Vol. 4, No. 4.
[5]Grace, L.K J, Maheswari, V., and Nagamalai, D (2011). Analysis of web logs and web user in web mining, International Journal of Network Security & Its Applications (IJNSA), Vol.3, No.1, http://arxiv.org/ftp/arxiv/papers/1101/1101.5668.pdf Accessed on November 10, 2014.
[6]Revathi, T., Mohana, R. M., Sasanka, C. SKumar, K. J., Kiran, B. U. (2012). An Enhanced Pre-Processing Research Framework for Web Log Data. International Journal of Advanced Research in Computer Science and Software Engineering Research, Volume 2, Issue 3. ISSN: 2277 128X. http://www.ijarcsse.com /docs/papers/March2012/volume_2_Issue_3 /V2I300119.pdf. Accessed on November 15, 2014.
[7]Ezeife, C. I., Dong, J., and Aggarwal, A. K. (2008): SensorWebIDS: A Web Mining Intrusion Detection System. 12th International Database Engineering and Applications Symposium {(IDEAS} 2008), September 10-12, 2008, Coimbra, Portugal}, http://dblp.uni-trier.de/rec /bib/conf/ideas/EzeifeEA08. Accessed on May 8, 2015.
[8]Liu, H., Hussain, F., Tan, C., Dash, M. (2002). Discretization: An Enabling Technique. Data Mining and Knowledge Discovery, 6, 393–423, Kluwer Academic Publishers.
[9]Lustgarten, J., Visweswaran, S., Gopalakrishnan, V., and Cooper, G. (2011). Application of an efficient Bayesian discretization method to biomedical data, in BMC Bioinformatics. http://www.biomedcentral. com/1471-2105/12/309. Accessed on August 9, 2014.
[10]Grzymala-Busse J.W., & Stefanowski J. (2001). Three Discretization Methods for Rule Induction, International Journal of Intelligent Systems, 16, 29-38.
[11]Chandrama, W., Devale, P. R., Murumkar, R. (2014). Data Preprocessing Method of Web Usage Mining for Data Cleaning and Identifying User Navigational Pattern, IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 1 Issue 10, ISSN 2348 – 7968.
[12]Mabzool, M., and Lighvan, M. Z., (2014) Intrusion Detection System Based On Web Usage Mining, International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.4, No.1.
[13]Kharwar, A. R., Naik, C. A., Desai, N. K. (2014), A Complete PreProcessing Method for Web Usage Mining, International Journal of Emerging Technology and Advanced Engineering, Volume 3, Issue 10, ISSN 2250-2459.
[14]Losarwar, V., and Joshi, M. (2012), Data Preprocessing in Web Usage Mining, International Conference on Artificial Intelligence and Embedded Systems (ICAIES'2012) July 15-16, 2012 Singapore.
[15]Upadhyay, G. M., and Dhingra, K. (2013), Web Content Mining: Its Techniques and Uses, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 11, ISSN: 2277 128X.