Enhancing Web Security through Machine Learning-based Detection of Phishing Websites

PDF (932KB), PP.39-56

Views: 0 Downloads: 0

Author(s)

Najla Odeh 1,* Derar Eleyan 1 Amna Eleyan 2

1. Palestine Technical University Kadoorie / Computer Science Department, Faculty of Information Technology, Tulkarm, P.O Box 305, Palestine

2. Manchester Metropolitan University / Department of Computing and Mathematics, Manchester M15 6BH, United Kingdom

* Corresponding author.

DOI: https://doi.org/10.5815/ijcnis.2025.01.04

Received: 4 Jul. 2023 / Revised: 30 Oct. 2023 / Accepted: 22 Dec. 2023 / Published: 8 Feb. 2025

Index Terms

Web Security, Phishing, Machine Learning, Cyberattacks, Fake Websites, Blacklists

Abstract

The rise of cyberattacks has led to an increase in the creation of fake websites by attackers, who use these sites for advertising products, transmit malware, or steal valuable login credentials. Phishing, the act of soliciting sensitive information from users by masquerading as a trustworthy entity, is a common technique used by attackers to achieve their goals. Spoofed websites and email spoofing are often used in phishing attacks, with spoofed emails redirecting users to phishing websites in order to trick them into revealing their personal information. Traditional solutions for detecting phishing websites rely on signature-based approaches that are not effective in detecting newly created spoofed websites. To address this challenge, researchers have been exploring machine-learning methods for detecting phishing websites. In this paper, we suggest a new approach that combines the use of blacklists and machine learning techniques such that a variety of powerful features, including domain-based features, abnormal features, and abnormal features based on URLs, HTML, and JavaScript, to rank web pages and improve classification accuracy. Our experimental results show that using the proposed approach, the random forest classifier offers the best accuracy of 93%, with FPR and FNR as 0.12 and 0.02, with a Precision of 90%, Recall of 97% an F1 Score of 93%, and MCC of 0.85.

Cite This Paper

Najla Odeh, Derar Eleyan, Amna Eleyan, "Enhancing Web Security through Machine Learning-based Detection of Phishing Websites", International Journal of Computer Network and Information Security(IJCNIS), Vol.17, No.1, pp.39-56, 2025. DOI:10.5815/ijcnis.2025.01.04

Reference

[1]Singla, S., Gandotra, E., Bansal, D., & Sofat, S. “A novel approach to malware detection using static classification”, International Journal of Computer Science and Information, Vol.13, No.3, pp.1-5, 2015.
[2]Enterprise, V. "Verizon 2018 data breach investigations report", 2018. [Online]. Available: https://verizon.com/business/resources/reports/2018-data-breach-digest.pdf
[3]APWG. "Phishing Activity Trends Report, 1st Quarter 2018", 2018. [Online]. Available: https://docs.apwg.org/reports/apwg_trends_report_q1_2018.pdf
[4]APWG. "Phishing Activity Trends Report 4th Quarter 2019", 2019. [Online]. Available: https://docs.apwg.org/reports/apwg_trends_report_q3_2019.pdf
[5]Gandotra, E., Bansal, D., & Sofat, S. “Malware intelligence: beyond malware analysis”, International Journal of Advanced Intelligence Paradigms, Vol.13, No.1-2, pp.80-100, 2019. DOI: 10.1504/IJAIP.2019.099945
[6]Sharma, A., Gandotra, E., Bansal, D., & Gupta, D. “Malware capability assessment using fuzzy logic”, Cybernetics and Systems, Vol.50, No.4, pp.323-338, 2019. DOI: 10.1080/01969722.2018.1552906
[7]Chiew, K.L., Yong, K.S.C., & Tan, C.L.J.E.S.w.A. “A survey of phishing attacks: Their types, vectors, and technical approaches”, Expert Systems with Applications, Vol.106, pp.1-20, 2018. DOI: 10.1016/j.eswa.2018.03.050
[8]Gandotra, E., & Sofat, S.J.I.J.o.N.-G.C. “Tools & Techniques for Malware Analysis and Classification”, International Journal of Next-Generation Computing, Vol.7, No.3, pp.176-197, 2016.
[9]Federal Trade Commission. "How to Recognize and Avoid Phishing Scams", https://consumer.ftc.gov/articles/how-recognize-and-avoid-phishing-scams (accessed Aug. 1, 2023).
[10]Security Gladiators. "How to Detect a Phishing Email Attack and Scam: Tips and Methods" https://securitygladiators.com/threat/phishing/detection/ (accessed Aug. 15, 2023).
[11]Krupalin, V.A., Sriramakrishnan, G.V., & Daniya, T. “A Survey and Taxonomy of Anti-Phishing Techniques for Detecting Fake Websites”. 2022 4th International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, pp.601-604, 2022. DOI: 10.1109/ICIRCA54612.2022.9985744
[12]Agrawal, V. "WhatAPhish: Detecting Phishing Websites" https://towardsdatascience.com/whataphish-detecting-phishing-websites-e5e1f14ef1a9 (accessed sept. 10, 2023).
[13]Shahrivari, V., Darabi, M. M., & Izadi, M. “Phishing detection using machine learning techniques”, arXiv preprint arXi, Vol.2009, No.11116,‏ 2020.
[14]Volkamer, M., Renaud, K., Reinheimer, B., & Kunz, A. “User Experiences of Torpedo: Tooltip-Powered Phishing Email Detection”, Computers & Security, Vol.71, pp.100-113, 2017. DOI: 10.1016/j.cose.2017.02.004
[15]Oest, A., Safaei, Y., Zhang, P., Wardman, B., Tyers, K., Shoshitaishvili, Y., & Doupé, A. “PhishTime: Continuous Longitudinal Measurement of the Effectiveness of Anti-phishing Blacklists”, 29th USENIX Security Symposium (USENIX Security 20), pp. 379-396, 2020.
[16]Chiew, K. L., Chang, E. H., & Tiong, W. K. “Utilization of website logo for phishing detection”, Computers & Security, Vol.54, pp.16-26, 2015. DOI: 10.1016/j.cose.2015.07.006
[17]Tan, C.L., Chiew, K.L., & Wong, K.J.D.S.S. “PhishWHO: Phishing Webpage Detection via Identity Keywords Extraction and Target Domain Name Finder”, Decision Support Systems, Vol.88, pp.18-27, 2016. DOI: 10.1016/j.dss.2016.05.005
[18]AlSabah, M., Nabeel, M., Boshmaf, Y., & Choo, E. “Content-Agnostic Detection of Phishing Domains Using Certificate Transparency and Passive DNS”, Proceedings of the 25th International Symposium on Research in Attacks, Intrusions, and Defenses, pp. 446-459, 2022. DOI: 10.1145/3545948.3545958
[19]Torrealba A, L. and Bustos-Jiménez, J. “Detecting Phishing in a Heuristic Way (Abstract)”, 2021.
[20]Sinha, J., & Sachan, M. “PhishX: An Empirical Approach to Phishing Detection”, 2022. DOI: 10.1145/1122445.1122456
[21]Bhattacharyya, S., kumar Pal, C., & kumar Pandey, P. “Detecting Phishing Websites, a Heuristic Approach”, International Journal of Latest Engineering Research and Applications (IJLERA), Vol.2, No.03, pp120-129, 2017.
[22]Ranaldi, L., Petito, M., Gerardi, M., Fallucchid, F., & Zanzotto, F.M. “Machine Learning Techniques for Italian Phishing Detection”, in Italian Conference on Cybersecurity, Rome, Italy 2022.
[23]Moghimi, M., and Varjani, A.Y.J.E.s.w.a. “New Rule-Based Phishing Detection Method”, Expert systems with applications, Vol.53, pp.231-242, 2016. DOI: 10.1016/j.eswa.2016.01.028
[24]Mohammad, R.M., Thabtah, F., & McCluskey, L.J.I.I.S. “Intelligent Rule-Based Phishing Websites Classification”, IET Information Security, Vol.8, No:3, pp.153-160, 2014. DOI: 10.1049/iet-ifs.2013.0202
[25]Varshney, G., Misra, M., & Atrey, P. K. “A Phish Detector Using Lightweight Search Features”, Computers & Security, Vol.62, pp.213-228, 2016. DOI: 10.1016/j.cose.2016.08.003
[26]Srinivasa Rao, R., and Pais, A.R. “Detecting Phishing Websites Using Automation of Human Behavior”, In Proceedings of the 3rd ACM Workshop on Cyber-Physical System Security, pp.33-42, 2017. DOI: 10.1145/3055186.3055188.
[27]Jain, A.K., and Gupta, B.B. “Towards Detection of Phishing Websites on the Client Side Using a Machine Learning-Based Approach”, Telecommunication Systems, Vol.68, pp.687-700, 2018. DOI: 10.1007/s11235-017-0414-0
[28]Babagoli, M., Aghababa, M.P., and Solouk, V.J.S.C. “Heuristic Nonlinear Regression Strategy for Detecting Phishing Websites”, Soft Computing, Vol.23, No.12, pp4315-4327, 2019. DOI: 10.1007/s00500-018-3084-2
[29]Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. “Machine Learning-Based Phishing Detection from URLs”, Expert Systems with Applications, Vol.117, pp.345-357, 2019. DOI: 10.1016/j.eswa.2018.09.029
[30]Doke, T., Khismatrao, P., Jambhale, V., & Marathe, N. J. I. W. C. “Phishing-Inspector: Detection & Prevention of Phishing Websites”, ITM Web of Conferences, Vol.32, No.03004, 2020. DOI: 10.1051/itmconf/20203203004
[31]Sameen, M., Han, K., and Hwang, S.O. “PhishHaven—An Efficient Real-Time AI Phishing URLs Detection System”, IEEE Access, Vol.8, pp.83425-83443, 2020. DOI: 10.1109/ACCESS.2020.2991403
[32]Gandotra, E., and Gupta, D. “Improving Spoofed Website Detection Using Machine Learning”, Cybernetics and Systems, Vol.52, No.2, pp169-190, 2021. DOI: 10.1080/01969722.2020.1826659
[33]Gupta, B. B., Yadav, K., Razzak, I., Psannis, K., Castiglione, A., & Chang, X. “A Novel Approach for Phishing URLs Detection Using Lexical-Based Machine Learning in a Real-Time Environment”, Computer Communications, Vol.175, pp.47-57, 2021. DOI: 10.1016/j.comcom.2021.04.023
[34]Mourtaji, Y., Bouhorma, M., Alghazzawi, D., Aldabbagh, G., & Alghamdi, A. “Hybrid Rule-Based Solution for Phishing URL Detection Using Convolutional Neural Network”, Wireless Communications and Mobile Computing, pp.1-24, 2021. DOI: 10.1155/2021/8241104
[35]Xie, B., Li, Q., and Wei, N. “Phishing Short URL Detection Based on Link Jumping on Social Networks”, In ITM Web of Conferences, Vol. 47, pp. 01009, 2022. DOI: 10.1051/itmconf/20224701009
[36]Sánchez-Paniagua, M., Fernández, E. F., Alegre, E., Al-Nabki, W., & Gonzalez-Castro, V. “Phishing URL Detection: A Real-Case Scenario Through Login URLs”, IEEE Access, Vol.10, pp.42949-42960, 2022. DOI: 10.1109/ACCESS.2022.3168681
[37]Mohammad, R., and McCluskey, L. “Phishing Websites. UCI Machine Learning Repository”, UCI Machine Learning Repository, 2015. DOI: 10.24432/C51W2X.
[38]Wickramasinghe, I., & Kalutarage, H. “Naive Bayes: Applications, Variations, and Vulnerabilities: A Review of Literature with Code Snippets for Implementation”, Soft Computing, Vol.25, No.3, pp2277–2293, 2021. DOI: 10.1007/s00500-020-05297-6.
[39]JavaTPoint. "Decision Tree Algorithm in Machine Learning" https://www.javatpoint.com/machine-learning-decision-tree-classification-algorithm (accessed Nov. 10, 2023).
[40]Bansal, M., Goyal, A., & Choudhary, A. “A Comparative Analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short-Term Memory Algorithms in Machine Learning”, Decision Analytics Journal, Vol.3, pp.100071, 2022. DOI: 10.1016/j.dajour.2022.100071.
[41]GeeksforGeeks. "Support Vector Machine SVM Algorithm" https://www.geeksforgeeks.org/support-vector-machine-algorithm/ (accessed Nov. 10, 2023).
[42]Analytics Vidhya. "Understand Random Forest Algorithms with Examples" https://www.analyticsvidhya.com/blog/2021/06/understanding-random-forest/ (accessed Nov. 10, 2023).
[43]JavaTPoint. "Logistic Regression in Machine Learning" https://www.javatpoint.com/logistic-regression-in-machine-learning (accessed Nov. 10, 2023).
[44]Analytics Vidhya. "Master the AdaBoost Algorithm: Guide to Implementing & Understanding AdaBoost"