IJCNIS Vol. 11, No. 4, 8 Apr. 2019
Cover page and Table of Contents: PDF (size: 1124KB)
Full Text (PDF, 1124KB), PP.43-52
Views: 0 Downloads: 0
Network, Intrusion, Machine Learning, NSL-KDD Dataset, Feature Selection
Intrusion Detection is one of the most common approaches used in detecting malicious activities in any network by analyzing its traffic. Machine Learning (ML) algorithms help to study the high dimensional network traffic and identify abnormal flow in traffic with high accuracy. It is crucial to integrate machine learning algorithms with dimensionality reduction to decrease the underlying complexity of processing of huge datasets and detect intrusions within real-time. This paper evaluates 10 most popular ML algorithms on NSL-KDD dataset. Thereafter, the ranking of these algorithms is done to identify best performing ML algorithm on the basis of their performance on several parameters such as specificity, sensitivity, accuracy etc. After analyzing the top 4 algorithms, it becomes evident that they consume a lot of time while model building. Therefore, feature selection is applied to detect intrusions in as little time as possible without compromising accuracy. Experimental results clearly demonstrate that which algorithm works best with/without feature selection/reduction technique in terms of achieving high accuracy while minimizing the time taken in building the model.
Prachi, Heena Malhotra, Prabha Sharma, "Intrusion Detection using Machine Learning and Feature Selection", International Journal of Computer Network and Information Security(IJCNIS), Vol.11, No.4, pp.43-52, 2019. DOI:10.5815/ijcnis.2019.04.06
[1]Summers R. C., “Secure computing: Threats and safe-guards” in Computers, New York: McGraw-Hill, 2000, pp. 1-688.
[2]Intrusion Detection Systems: Definition, Need and Challenges, SANS Institute 2001. https://www.sans.org/reading-room/whitepapers/detection/intrusion-detection-systems-definition-challenges-343
[3]Benferhat S., Tabia K., “Integrating Anomaly-Based Approach into Bayesian Network Classifiers” in e-Business and Telecommunications, 2009, vol.8, eds. Joaquim Filipe, Mohammad S. Obaidat, pp. 127-139.
[4]Snort (2014), the open source network intrusion detection system [online]. Available at: http://www.snort.org/.
[5]Ranjan R, Sahoo G., “A new clustering approach for anomaly intrusion detection” in International Journal of Data Mining and Knowledge Management Process, 2014 Mar; 4(2), pp. 29–38.
[6]McHugh J., “Testing intrusion detection systems: A critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory” in ACM Transactions on Information and System Security, vol. 3, no. 4, 2000, pp. 262–294.
[7]Hofmann A., Sick B., “Online Intrusion Alert Aggregation with Generative Data Stream Modeling," in IEEE Transactions on Dependable and Secure Computing, vol. 8, no. 2, 2011, pp. 282-294.
[8]WEKA Machine Learning Project: http://www.cs.waikato.ac.nz/~ml/weka/index.html.
[9]NSL-KDD dataset for network based intrusion detection systems. Available at: http://nsl.cs.unb.ca/NSL-KDD/, December 2016.
[10]Anderson J. P., “Computer security threat monitoring and surveillance,” Technical Report, Fort Washington, Pennsylvania, USA, 1980.
[11]Lee W. and Stolfo S. J., “Data mining approaches for intrusion detection” in Proceedings of the 7th conference on USENIX Security Symposium, vol. 7, San Antonio, TX, 1998.
[12]Schultz M. G., Eskin E., Zadok E., Stolfo S. J., “Data Mining Methods for detection of New Malicious Executables”, in IEEE Symposium on Security and Privacy, Columbia University, 14-16 May 2001, pp.38-49.
[13]Hwang T., Lee T., and Lee Y., “A Three-tier IDS via Data Mining Approach” in Proceedings of the 3rd annual ACM workshop on Mining network data, 2007, pp. 1-6.
[14]Tavallaee M., Bagheri E., Lu W., and Ghorbani A., “A Detailed Analysis of the KDD CUP 99 Data Set,” Submitted to Second IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), 2009.
[15]Srinivasulu P., Nagaraju D., Kumar P. R., and Rao K. N., “Classifying the Network Intrusion Attacks using Data Mining Classification Methods and their Performance Comparison” in IJCSNS International Journal of Computer Science and Network Security, vol. 9, no.6, 2009, pp. 11-18.
[16]Reddy K., Iaeng M., Reddy V. N., and Rajulu P. G., in “A Study of Intrusion Detection in Data Mining” in World Congress on Engineering, vol. III, 2011, July 6-8.
[17]Nadiammai G. V. and Hemalatha M., “Perspective analysis of machine learning classifiers for detecting network intrusions” in IEEE Third International Conference on Computing Communication & Networking Technologies (ICCCNT), India, 26-28 July 2012, 2012, pp. 1-7.
[18]Neethu B., “Classification of Intrusion Detection Dataset using machine learning Approaches” in International Journal of Electronics and Computer Science Engineering, vol. 1, 2012, pp. 1044-51.
[19]Revathi S., Malathi A., “A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection”, in International Journal of Engineering Research & Technology (IJERT), vol. 2 no. 12, 2013, pp. 1848-1853.
[20]Choudhary S. and Bhowal A., “Comparative Analysis of Machine Learning Algorithms along with Classifiers for Network Intrusion Detection” in IEEE International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), Chennai, T.N., India, 6-8 May 2015, pp. 89-95.
[21]Murthy P. C., Manjunatha A. S., Jaiswal A., Madhu B. R., “Building Efficient Classifiers For Intrusion Detection With Reduction of Features” in International Journal of Applied Engineering Research, vol. 11, no. 6, 2016, pp. 4590-4596.
[22]Latha S., Prakash S. J., “HPFSM - A High Pertinent Feature Selection Mechanism for Intrusion Detection System,” International Journal of Pure and Applied Mathematics, vol 118, no. 9, 77-83
[23]Biswas. S. K., “Intrusion Detection Using Machine Learning: A Comparison Study,” International Journal of Pure and Applied Mathematics, vol 118, no. 19, 101-114
[24]WEKA 3.9 - Data Mining with Open Source Machine Learning Software in Java. [Online] Available at: http://www.cs.waikato.ac.nz/ml/weka/ [Accessed: July, 2016].
[25]John G.H., P. Langley, “Estimating Continuous Distributions in Bayesian Classifiers” in Proc. Of the 11th Conference on Uncertainity in Artificial Intelligence, August 18 - 20, 1995, pp. 338-345.
[26]Kaur H., “Algorithm used in Intrusion Detection System: A Review”, International Journal of Innovative Research in Computer and Communication Engineering, 2014, vol. 2, issue 5, May 2014.
[27]Quinlan J. R., “C4.5: Programs for Machine Learning”, Morgan Kaufmann Publishers, San Mateo, CA., 1993.
[28]Witten I. H., Frank E. and Hall M. A., “Data Mining: Practical Machine Learning Tools and Techniques”, 3rd edition, eds. J. Geller, E. Davis, P.A. Flach, Morgan Kaufmann Publishers Inc, 2011, pp. 1-558.
[29]Cessie S. L., Houwelingen J. C., “Ridge Estimators in Logistic Regression” in Applied Statistics, vol. 41, no. 1, 1992, pp. 191-201.
[30]John G.H., “Irrelevant Features and the Subset Selection Problem” in proc. of the 11th Int. Conf. on Machine Learning, Morgan Kaufmann Publishers, 1994, pp.121-129.
[31]Dash M. & Liu H., “Feature Selection for Classification” in Intelligent Data Analysis, vol.1(3), 1997, pp. 131–56.