Dmytro Uhryn; Victoria Vysotska; Lyubomyr Chyrun; Sofia Chyrun; Cennuo Hu; Yuriy Ushenko

Intelligent Application for Textual Content Authorship Identification based on Machine Learning and Sentiment Analysis

PDF (4481KB), PP.56-100

Views: 0 Downloads: 0

Author(s)

Dmytro Uhryn ¹ Victoria Vysotska ² Lyubomyr Chyrun ³ Sofia Chyrun ⁴ Cennuo Hu ⁵ Yuriy Ushenko ^1,6

1. Department of Computer Science, Educational and Research Institute of Physical, Technical and Computer Sciences, Yuriy Fedkovych Chernivtsi National University, 58012, Ukraine

2. Department of Information Systems and Networks, Institute of Computer Sciences and Information Technologies, Lviv Polytechnic National University, Lviv, 79013, Ukraine

3. Applied Mathematics Department, Faculty of Applied Mathematics and Informatics, Ivan Franko National University of Lviv, Lviv, 79000, Ukraine

4. Telecommunication Department, Lviv Polytechnic National University, Lviv, 79013, Ukraine

5. Department of Computer Science, College of Science, Purdue University, West Lafayette, IN 47907, USA

6. Department of Physics, Shaoxing University, Shaoxing, Zhejiang Province 312000, China

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2025.02.05

Received: 16 Jan. 2025 / Revised: 21 Feb. 2025 / Accepted: 13 Mar. 2025 / Published: 8 Apr. 2025

Index Terms

Machine Learning Methods, Text Analysis, Authorship Identification, Sentiment Analysis, NLP, SVM, LSTM, CNN, RNN

Abstract

During the development and implementation of the software system for text analysis, attention was focused on the morphological, syntactic and stylistic levels of the language, which made it possible to develop detailed profiles of authorship for various writers. The main goal of the system is to automate the process of identifying authorship and detecting plagiarism, which ensures the protection of intellectual property and contributes to the preservation of cultural heritage. The scientific novelty of the research was manifested in the development of specific algorithms adapted to the peculiarities of the natural language, as well as in the use of advanced technologies, such as deep learning and big data. The introduction of the interdisciplinary approach, which combines computer science, linguistics, and literary studies, has opened up new perspectives for the detailed analysis of scholarly works. The results of the work confirm the high efficiency and accuracy of the system in authorship identification, which can serve as an essential tool for scientists, publishers, and law enforcement agencies. In addition to technical aspects, it is vital to take into account ethical issues related to confidentiality and copyright protection, which puts under control not only the technological side of the process but also moral and legal norms. Thus, the work revealed the importance and potential of using modern text processing methods for improving literary analysis and protecting cultural heritage, which makes it significant for further research and practical use in this area.

Cite This Paper

Dmytro Uhryn, Victoria Vysotska, Lyubomyr Chyrun, Sofia Chyrun, Cennuo Hu, Yuriy Ushenko, "Intelligent Application for Textual Content Authorship Identification based on Machine Learning and Sentiment Analysis", International Journal of Intelligent Systems and Applications(IJISA), Vol.17, No.2, pp.56-100, 2025. DOI:10.5815/ijisa.2025.02.05

Reference

[1]Ren, Z., Shen, Q., Diao, X., & Xu, H. (2021). A sentiment-aware deep learning approach for personality detection from text. Information Processing & Management, 58(3), 102532.
[2]Sahoo, S. R., & Gupta, B. B. (2021). Multiple features based approach for automatic fake news detection on social networks using deep learning. Applied Soft Computing, 100, 106983.
[3]Iwendi, C., Srivastava, G., Khan, S., & Maddikunta, P. K. R. (2023). Cyberbullying detection solutions based on deep learning architectures. Multimedia Systems, 29(3), 1839-1852.
[4]Hassabis, D., & Hall, W. Evaluating Computational Methodologies: A Comparative Study of Authorship Legitimacy and Facial Recognition Technologies. AlgoVista: Journal of AI and Computer Science, 1(2), 592654.
[5]Crothers, E. N., Japkowicz, N., & Viktor, H. L. (2023). Machine-generated text: A comprehensive survey of threat models and detection methods. IEEE Access, 11, 70977-71002.
[6]Vysotska, V., Chyrun, L., Chyrun, S., & Soltys, M. (2024). Information technology for textual content author’s gender and age determination based on machine learning. In CEUR Workshop Proceedings.
[7]Lund, B. D., Wang, T., Mannuru, N. R., Nie, B., Shimray, S., & Wang, Z. (2023). ChatGPT and a new academic reality: Artificial Intelligence‐written research papers and the ethics of the large language models in scholarly publishing. Journal of the Association for Information Science and Technology, 74(5), 570-581.
[8]Jin, D., Jin, Z., Hu, Z., Vechtomova, O., & Mihalcea, R. (2022). Deep learning for text style transfer: A survey. Computational Linguistics, 48(1), 155-205.
[9]Vysotska, V., Markiv, O., Teslia, S., Romanova, Y., & Pihulechko, I. (2022). Correlation Analysis of Text Author Identification Results Based on N-Grams Frequency Distribution in Ukrainian Scientific and Technical Articles. In COLINS (pp. 277-314).
[10]Romanchuk, R., Vysotska, V., Andrunyk, V., Chyrun, L., Chyrun, S., & Brodyak, O. (2023, October). Intellectual Analysis System Project for Ukrainian-language Artistic Works to Determine the Text Authorship Attribution Probability. In 2023 IEEE 18th International Conference on Computer Science and Information Technologies (CSIT) (pp. 1-6). IEEE.
[11]Lund, B. D., Wang, T., Mannuru, N. R., Nie, B., Shimray, S., & Wang, Z. (2023). ChatGPT and a new academic reality: Artificial Intelligence‐written research papers and the ethics of the large language models in scholarly publishing. Journal of the Association for Information Science and Technology, 74(5), 570-581.
[12]Lund, B. D., Wang, T., Mannuru, N. R., Nie, B., Shimray, S., & Wang, Z. (2023). ChatGPT and a new academic reality: Artificial Intelligence‐written research papers and the ethics of the large language models in scholarly publishing. Journal of the Association for Information Science and Technology, 74(5), 570-581.
[13]Wankhade, M., Rao, A. C. S., & Kulkarni, C. (2022). A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence Review, 55(7), 5731-5780.
[14]Nandwani, P., & Verma, R. (2021). A review on sentiment analysis and emotion detection from text. Social network analysis and mining, 11(1), 81.
[15]Vysotska, V., Nazarkevych, M., Vladov, S., Lozynska, O., Markiv, O., Romanchuk, R., & Danylyk, V. (2024). Devising A Method For Detecting Information Threats In The Ukrainian Cyber Space Based On Machine Learning. Eastern-European Journal of Enterprise Technologies, 132(2).
[16]Tverdokhlib, O., Vysotska, V., Pukach, P., & Vovk, M. (2024). Information Technology for Identifying Hate Speech in Online Communication Based on Machine Learning. In Data-Centric Business and Applications: Modern Trends in Financial and Innovation Data Processes 2023. Volume 1 (pp. 339-369). Cham: Springer Nature Switzerland.
[17]Kholodna, N., Vysotska, V., Markiv, O., & Chyrun, S. (2022, November). Machine Learning Model for Paraphrases Detection Based on Text Content Pair Binary Classification. In MoMLeT+ DS (pp. 283-306).
[18]Vysotska, V., Pukach, P., Lytvyn, V., Uhryn, D., Ushenko, Y., & Hu, Z. (2023). Intelligent analysis of Ukrainian-language tweets for public opinion research based on NLP methods and machine learning technology. International Journal of Modern Education and Computer Science, 15(3), 70-93.
[19]Vysotska, V., Mazepa, S., Chyrun, L., Brodyak, O., Shakleina, I., & Schuchmann, V. (2022, November). NLP tool for extracting relevant information from criminal reports or fakes/propaganda content. In 2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT) (pp. 93-98). IEEE.
[20]Ivanchyshyn, D., Vysotska, V., & Albota, S. (2021, September). The Film Script Generation Analysis Based on the Fiction Book Text Using Machine Learning. In 2021 IEEE 16th International Conference on Computer Sciences and Information Technologies (CSIT) (Vol. 2, pp. 68-80). IEEE.
[21]Blitzer, J., Dredze, M., & Pereira, F. (2007, June). Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th annual meeting of the association of computational linguistics (pp. 440-447).
[22]Collection of stop words for the Ukrainian language. URL: https://github.com/stopwords-iso/stopwords-uk
[23]Word2Vec Pre-Trained Vector Collection. URL: https://code.google.com/archive/p/word2vec/
[24]A collection of pre-trained Word2Vec vectors for the Ukrainian language. URL: https://lang.org.ua/uk/models/
[25]GloVe Pre-Trained Vector Collection. URL: https://www.kaggle.com/datasets/anindya2906/glove6b [56]
[26]Trains a Bidirectional LSTM on the IMDB sentiment classification task. URL: https://keras.io/zh/examples/imdb_bidirectional_lstm/
[27]Cui, Z., Ke, R., Pu, Z., & Wang, Y. (2018). Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. arXiv preprint arXiv:1801.02143. https://arxiv.org/abs/1801.02143.
[28]Chiu, J. P., & Nichols, E. (2016). Named entity recognition with bidirectional LSTM-CNNs. Transactions of the association for computational linguistics, 4, 357-370.

International Journal of Intelligent Systems and Applications (IJISA)