Oleksiy Tverdokhlib; Victoria Vysotska; Olena Nagachevska; Yuriy Ushenko; Dmytro Uhryn; Yurii Tomka

Intelligent Processing Censoring Inappropriate Content in Images, News, Messages and Articles on Web Pages Based on Machine Learning

PDF (4291KB), PP.107-164

Views: 0 Downloads: 0

Author(s)

Oleksiy Tverdokhlib ¹ Victoria Vysotska ¹ Olena Nagachevska ¹ Yuriy Ushenko ^2,* Dmytro Uhryn ² Yurii Tomka ²

1. Lviv Polytechnic National University, Lviv, 79013, Ukraine

2. Yuriy Fedkovych Chernivtsi National University, Chernivtsi, 58012, Ukraine

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2025.01.08

Received: 11 Aug. 2024 / Revised: 26 Oct. 2024 / Accepted: 15 Dec. 2024 / Published: 8 Feb. 2025

Index Terms

Internet Safety, Image, Image Recognition, Hate Speech Identification, News, Massage, Article, Censorship, Inappropriate Content, Browser Extension, AI Technology, Content Filtering, Personalization

Abstract

This project aims to enhance online experiences quality by giving users greater control over the content they encounter daily. The proposed solution is particularly valuable for parents seeking to safeguard their children, educational institutions striving to foster a more conducive learning environment, and individuals prioritising ethical internet usage. It also supports users who wish to limit their exposure to misinformation, including fake news, propaganda, and disinformation. Through the implementation of a browser extension, this system will contribute to a safer internet, reducing users' vulnerability to harmful content and promoting a more positive and productive online environment. The primary objective of this work is to develop a browser extension that automatically detects and censors inappropriate text and images on web pages using artificial intelligence (AI) technologies. The extension will enable users to personalise censorship settings, including the ability to define custom prohibited words and toggle the filtering of text and images. Accuracy estimates for various classifiers such as Random Forest (0.879), Logistic Regression (0.904), Decision Tree (0.878), Naive Bayes (0.315), and KNN (0.832) were performed.

Cite This Paper

Oleksiy Tverdokhlib, Victoria Vysotska, Olena Nagachevska, Yuriy Ushenko, Dmytro Uhryn, Yurii Tomka, "Intelligent Processing Censoring Inappropriate Content in Images, News, Messages and Articles on Web Pages Based on Machine Learning", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.17, No.1, pp. 107-164, 2025. DOI:10.5815/ijigsp.2025.01.08

Reference

[1]Purnima, T., & Rao, C. K. (2024). Automated Detection of Offensive Images and Sarcastic Memes in Social Media Through NLP. International Journal of Advanced Computer Science & Applications, 15(7).
[2]Jani, K., Chaudhuri, M., Patel, H., & Shah, M. (2020). Machine learning in films: an approach towards automation in film censoring. Journal of Data, Information and Management, 2, 55-64.
[3]Gongane, V. U., Munot, M. V., & Anuse, A. D. (2022). Detection and moderation of detrimental content on social media platforms: current status and future directions. Social Network Analysis and Mining, 12(1), 129.
[4]Einwiller, S. A., & Kim, S. (2020). How online content providers moderate user‐generated content to prevent harmful online communication: An analysis of policies and their implementation. Policy & Internet, 12(2), 184-206.
[5]Galli, F., Loreggia, A., & Sartor, G. (2022, May). The Regulation of Content Moderation. In International Conference on the Legal Challenges of the Fourth Industrial Revolution (pp. 63-87). Cham: Springer International Publishing.
[6]Yang, E., & Roberts, M. E. (2021, March). Censorship of online encyclopedias: Implications for NLP models. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 537-548).
[7]Chen, T. M. (2021). Automated content classification in social media platforms. In Securing Social Networks in Cyberspace (pp. 53-71). CRC Press.
[8]Gorwa, R., Binns, R., & Katzenbach, C. (2020). Algorithmic content moderation: Technical and political challenges in the automation of platform governance. Big Data & Society, 7(1), 2053951719897945.
[9]Tverdokhlib, O., Vysotska, V., Pukach, P., & Vovk, M. (2024). Information Technology for Identifying Hate Speech in Online Communication Based on Machine Learning. In Data-Centric Business and Applications: Modern Trends in Financial and Innovation Data Processes 2023. Volume 1 (pp. 339-369). Cham: Springer Nature Switzerland.
[10]Marsoof, A., Luco, A., Tan, H., & Joty, S. (2023). Content-filtering AI systems–limitations, challenges and regulatory approaches. Information & Communications Technology Law, 32(1), 64-101.
[11]Levshun, D., Tushkanova, O., & Chechulin, A. (2023). Two-model active learning approach for inappropriate information classification in social networks. International Journal of Information Security, 22(6), 1921-1936.
[12]BPMN diagram. URL: https://iampm.club/ua/blog/shho-take-bpmndiagrama-i-navishho-vona-potribna-z-prikladami-2/.
[13]Choosing Python or R for Data Analysis? An Infographic. URL: https://www.datacamp.com/community/tutorials/r-orpython-for-data-analysis.
[14]Naïve Bayes classifiers. URL: https://www.ibm.com/topics/naive-bayes.
[15]What is a Decision Tree? URL: https://www.ibm.com/topics/decision-trees.
[16]Understand Random Forest Algorithms With Examples? URL: https://www.analyticsvidhya.com/blog/2021/06/understanding-random-forest/.
[17]Logistic Regression. URL: https://towardsdatascience.com/logistic-regression-detailedoverview-46c4da4303bc.
[18]What is fastText? URL: https://fasttext.cc/.
[19]Python for NLP. URL: https://stackabuse.com/python-for-nlp-working-withfacebook-fasttext-library/.
[20]Train Python Code Embedding with FastText. URL: https://medium.com/nerd-for-tech/train-python-code-embeddingwith-fasttext-1e225f193cc.
[21]Tundis, A., Mukherjee, G., & Mühlhäuser, M. (2021). An algorithm for the detection of hidden propaganda in mixed-code text over the internet. Applied Sciences, 11(5), 2196.
[22]M. R. Alam, et al. "Social media content categorization using supervised based machine learning methods and natural language processing in bangla language." 11th International Conference on Electrical and Computer Engineering (ICECE). IEEE, 2020.
[23]V. Vysotska, et al. "NLP tool for extracting relevant information from criminal reports or fakes/propaganda content." 2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT). IEEE, 2022.
[24]W. H. Bangyal, et al. "Detection of Fake News Text Classification on COVID‐19 Using Deep Learning Approaches." Computational and mathematical methods in medicine 2021.1 (2021): 5514220.
[25]M. Haider, and A. Gandomi. "When big data made the headlines: Mining the text of big data coverage in the news media." International Journal of Services Technology and Management 27.1-2 (2021): 23-50.
[26]B. Kratzwald, et al. "Deep learning for affective computing: Text-based emotion recognition in decision support." Decision support systems 115 (2018): 24-35.
[27]V. Vysotska, et al. "Tone Analysis of Regional Articles in English-Language Newspapers Based on Recurrent Neural Network Bi-LSTM." 2023 IEEE 5th International Conference on Advanced Information and Communication Technologies (AICT). IEEE, 2023, pp. 158-163.
[28]S. Voloshyn, et al. "Sentiment Analysis Technology of English Newspapers Quotes Based on Neural Network as Public Opinion Influences Identification Tool." Proceedings of the IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT), 2022, pp. 83-88, doi: 10.1109/CSIT56902.2022.10000627.
[29]M. Hasan, et al. "Leveraging textual information for social media news categorization and sentiment analysis." Plos one 19.7 (2024): e0307027.
[30]V. Vysotska, et al. "Sentiment Analysis of Information Space as Feedback of Target Audience for Regional E-Business Support in Ukraine." CEUR Workshop Proceedings, Vol-3426, 2023, 488-513.
[31]N. Khairova, et al. "Models for effective categorization and classification of texts into specific thematic groups (using gender and criminal themes as examples)." CEUR Workshop Proceedings 3722 (2024) 37-49.
[32]L. Chyrun, et al. "The Electronic Digests Formation and Categorization for Textual Commercial Content." CEUR Workshop Proceedings, Vol-2870, 2021, pp. 1816-1831.
[33]V. Starko, Semantic Annotation for Ukrainian: Categorization Scheme, Principles, and Tools." CEUR workshop proceedings, Vol-2604, 239-248. (2020).
[34]A. Berko, et al. "The text classification based on Big Data analysis for keyword definition using stemming." Proceedings of IEEE 16th International conference on computer science and information technologies, Lviv, 22–25 September 2021, pp. 184–188.
[35]O. Barkovska, et al. "A Conceptual Text Classification Model Based on Two-Factor Selection of Significant Words." CEUR Workshop Proceedings, Vol-3396, 2023, 244-255.
[36]A. Sachenko, et al. "Method of Determining the Text Sentiment by Thematic Rubrics." CEUR Workshop Proceedings 3688 (2024) 404-414.
[37]R. Nazarchuk, and S. Albota, Tweets about Ukraine during the russian-Ukrainian War: Quantitative Characteristics and Sentiment Analysis." CEUR Workshop Proceedings, Vol-3426, 2023, 551-560.
[38]Shakib Sadat Shanto, Zishan Ahmed, Nisma Hossain, Auditi Roy, Akinul Islam Jony, "Binary vs. Multiclass Sentiment Classification for Bangla E-commerce Product Reviews: A Comparative Analysis of Machine Learning Models", International Journal of Information Engineering and Electronic Business, Vol.15, No.6, pp. 48-63, 2023.
[39]O. Zalutska, et al. "Method for Sentiment Analysis of Ukrainian-Language Reviews in E-Commerce Using RoBERTa Neural Network." CEUR Workshop Proceedings, Vol-3387, 2023, pp. 344-356.
[40]T. Olaleye, et al. "Evaluation of Vader and MultiLingual sentiment analyzers for opinion analytics using graphical illustrations." CEUR Workshop Proceedings, Vol-3171, 2022, pp. 150-160
[41]Neha Singh, Umesh Chandra Jaiswal, Ritu Singh, "Detecting Sarcasm Text in Sentiment Analysis Using Hybrid Machine Learning Approach", International Journal of Intelligent Systems and Applications, Vol.16, No.4, pp.72-85, 2024.
[42]S. Kubinska, et al. "Ukrainian Language Chatbot for Sentiment Analysis and User Interests Recognition based on Data Mining." CEUR Workshop Proceedings, Vol-3171, 2022, pp. 315-327
[43]Pavlo Radiuk, et al. "An Ensemble Machine Learning Approach for Twitter Sentiment Analysis." CEUR Workshop Proceedings, Vol-3171, 2022, pp. 387-397
[44]Zhengbing Hu, Ivan Dychka, Kateryna Potapova, Vasyl Meliukh, "Augmenting Sentiment Analysis Prediction in Binary Text Classification through Advanced Natural Language Processing Models and Classifiers", International Journal of Information Technology and Computer Science, Vol.16, No.2, pp.16-31, 2024.
[45]Z. Kochuieva, et al. "Usage of Sentiment Analysis to Tracking Public Opinion." CEUR Workshop Proceedings, Vol-2870, 2021, pp. 272-285.
[46]Neny Sulistianingsih, I Nyoman Switrayana, "Enhancing Sentiment Analysis for the 2024 Indonesia Election Using SMOTE-Tomek Links and Binary Logistic Regression", International Journal of Education and Management Engineering, Vol.14, No.3, pp. 22-32, 2024.
[47]Ritushree Narayan, Pintu Samanta, "A Machine Learning Approach for Sentiment Analysis Using Social Media Posts", International Journal of Information Technology and Computer Science, Vol.16, No.5, pp.23-35, 2024.
[48]O. Artemenko, et al. "Using sentiment text analysis of user reviews in social media for e-tourism mobile recommender systems." CEUR workshop proceedings, Vol-2604, 259-271. (2020)
[49]V. Bobicev, et al. "Sentiment Analysis in the Ukrainian and Russian News." First Ukraine Conference on Electrical and Computer Engineering (UKRCON), 1050-1055. (2017)
[50]K. Shakhovska, et al. "The sentiment analysis model of services providers’ feedback." Electronics (Switzerland), 2020, 9(11), pp. 1–15, 1922.
[51]Shashank Mishra, Mukul Aggarwal, Shivam Yadav, Yashika Sharma, "An Automated Model for Sentimental Analysis Using Long Short-Term Memory-based Deep Learning Model", International Journal of Engineering and Manufacturing, Vol.13, No.5, pp. 11-20, 2023.

International Journal of Image, Graphics and Signal Processing (IJIGSP)