Danylo Levkivskyi; Victoria Vysotska; Lyubomyr Chyrun; Yuriy Ushenko; Dmytro Uhryn; Cennuo Hu

Agile Methodology of Information Engineering for Semantic Annotations Categorization and Creation in Scientific Articles Based on NLP and Machine Learning Methods

PDF (1805KB), PP.1-50

Views: 0 Downloads: 0

Author(s)

Danylo Levkivskyi ¹ Victoria Vysotska ¹ Lyubomyr Chyrun ² Yuriy Ushenko ^3,4,* Dmytro Uhryn ⁴ Cennuo Hu ⁵

1. Department of Information Systems and Networks, Institute of Computer Sciences and Information Technologies, Lviv Polytechnic National University, Lviv, 79013, Ukraine

2. Applied Mathematics Department, Faculty of Applied Mathematics and Informatics, Ivan Franko National University of Lviv, Lviv, 79000, Ukraine

3. Department of Physics, Shaoxing University, Shaoxing, Zhejiang Province 312000, China

4. Department of Computer Science, Educational and Research Institute of Physical, Technical and Computer Sciences, Yuriy Fedkovych Chernivtsi National University, 58012, Ukraine

5. Department of Computer Science, College of Science, Purdue University, West Lafayette, IN 47907, USA

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2025.02.01

Received: 19 Jan. 2025 / Revised: 25 Feb. 2025 / Accepted: 20 Mar. 2025 / Published: 8 Apr. 2025

Index Terms

Automatic Annotation, Constraint Propagation Model, Text Relationship Maps, TRM Method

Abstract

Research devoted to the categorization and creation of semantic annotations for scientific articles stands out as an essential direction of development in the context of the growing volume of scientific literature. The application of machine learning and natural language processing in this field allows you to effectively organize and provide access to scientific information. The article discusses methods of automatic annotation of texts. Based on the review, the use of the constraint propagation model is proposed to improve the technique of text relationship maps. The developed software system is aimed at automating the process of analysis and categorization of scientific materials, which opens the way to improving the speed and accuracy of searching for the necessary information for researchers. The use of advanced machine learning models, such as roBERTa and RAG, ensures the highest quality of data processing and creation of semantic annotations. The accuracy of predicting article categories after improving the model reached 88%. The novelty of the approach is the combination of categorization and semantic annotation to increase the convenience and speed of searching for scientific information. The software system opens up opportunities for future expansion and improvement through the use of advanced technologies and machine learning models. This study is noted for its relevance, originality of approach and potential for practical application in the field of scientific research and development of science as a whole. The proposed approach contributes to the development of the Information Engineering and Electronic Business industry through the following key aspects: automation of categorization and annotation of scientific articles, improving the accuracy of information search, increasing the efficiency of scientific research, and the flexibility and scalability of the solution.

Cite This Paper

Danylo Levkivskyi, Victoria Vysotska, Lyubomyr Chyrun, Yuriy Ushenko, Dmytro Uhryn, Cennuo Hu, "Agile Methodology of Information Engineering for Semantic Annotations Categorization and Creation in Scientific Articles Based on NLP and Machine Learning Methods", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.17, No.2, pp. 1-50, 2025. DOI:10.5815/ijieeb.2025.02.01

Reference

[1]Bisikalo O., & Nazarov I., (2015). Review of the methods for automated absacting of the texts. Works of VNTU, no. 2, Nov. 2015. https://works.vntu.edu.ua/index.php/works/article/view/379/377
[2]Bisikalo, O., Kovtun, O., & Kovtun, V. (2023, September). Neural network concept of Ukrainian-language text embedding. In 2023 13th International Conference on Advanced Computer Information Technologies (ACIT) (pp. 566-569). IEEE.
[3]Kunnath, S. N., Herrmannova, D., Pride, D., & Knoth, P. (2021). A meta-analysis of semantic classification of citations. Quantitative science studies, 2(4), 1170-1215.
[4]Vysotska, V., Mazepa, S., Chyrun, L., Brodyak, O., Shakleina, I., & Schuchmann, V. (2022, November). NLP tool for extracting relevant information from criminal reports or fakes/propaganda content. In 2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT) (pp. 93-98). IEEE.
[5]Sharma, A., & Kumar, S. (2023). Machine learning and ontology-based novel semantic document indexing for information retrieval. Computers & Industrial Engineering, 176, 108940.
[6]Iqbal, S., Hassan, S. U., Aljohani, N. R., Alelyani, S., Nawaz, R., & Bornmann, L. (2021). A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies. Scientometrics, 126(8), 6551-6599.
[7]Vysotska, V. (2024). Modern State and Prospects of Information Technologies Development for Natural Language Content Processing. In COLINS (2) (pp. 198-234).
[8]Maulud, D. H., Zeebaree, S. R., Jacksi, K., Sadeeq, M. A. M., & Sharif, K. H. (2021). State of art for semantic analysis of natural language processing. Qubahan academic journal, 1(2), 21-28.
[9]Dessì, D., Osborne, F., Recupero, D. R., Buscaldi, D., & Motta, E. (2021). Generating knowledge graphs by employing natural language processing and machine learning techniques within the scholarly domain. Future Generation Computer Systems, 116, 253-264.
[10]Balush, I., Vysotska, V., & Albota, S. (2021, June). Recommendation System Development Based on Intelligent Search, NLP and Machine Learning Methods. In MoMLeT+ DS (pp. 584-617).
[11]Aksonov, D., Gozhyj, A., Kalinina, I., & Vysotska, V. (2021, September). Question-Answering Systems Development Based on Big Data Analysis. In 2021 IEEE 16th International Conference on Computer Sciences and Information Technologies (CSIT) (Vol. 1, pp. 113-118). IEEE.
[12]Ngo, Q. H., Kechadi, T., & Le-Khac, N. A. (2021). Domain specific entity recognition with semantic-based deep learning approach. IEEE Access, 9, 152892-152902.
[13]Kozlowski, D., Dusdal, J., Pang, J., & Zilian, A. (2021). Semantic and relational spaces in science of science: deep learning models for article vectorisation. Scientometrics, 126(7), 5881-5910.
[14]Sharifani, K., Amini, M., Akbari, Y., & Aghajanzadeh Godarzi, J. (2022). Operating machine learning across natural language processing techniques for improvement of fabricated news model. International Journal of Science and Information System Research, 12(9), 20-44.
[15]Aladakatti, S. S., & Senthil Kumar, S. (2023). Exploring natural language processing techniques to extract semantics from unstructured dataset which will aid in effective semantic interlinking. International Journal of Modeling, Simulation, and Scientific Computing, 14(01), 2243004.
[16]Naithani, K., & Raiwani, Y. P. (2023). Realization of natural language processing and machine learning approaches for text‐based sentiment analysis. Expert Systems, 40(5), e13114.
[17]Victoria Vysotska, Krzysztof Przystupa, Yurii Kulikov, Sofiia Chyrun, Yuriy Ushenko, Zhengbing Hu, Dmytro Uhryn, "Recognizing Fakes, Propaganda and Disinformation in Ukrainian Content based on NLP and Machine-learning Technology", International Journal of Computer Network and Information Security, Vol.17, No.1, pp.92-127, 2025.
[18]Najla Odeh, Derar Eleyan, Amna Eleyan, "Enhancing Web Security through Machine Learning-based Detection of Phishing Websites", International Journal of Computer Network and Information Security, Vol.17, No.1, pp.39-56, 2025.
[19]Kakelli Anil Kumar, Suman Tandan, Atul Koirala, "A Fake Product Identification and Prevention System Using Blockchain Technology", International Journal of Education and Management Engineering, Vol.14, No.6, pp. 20-31, 2024.
[20]Afeez Ayomide Olagunju, Iyabo Olukemi Awoyelu, "Performance Evaluation of Fake News Detection Models", International Journal of Information Technology and Computer Science, Vol.16, No.6, pp.89-100, 2024.
[21]Danylo Holubinka, Victoria Vysotska, Serhii Vladov, Yuriy Ushenko, Mariia Talakh, Yurii Tomka, "Intelligent System for Recognizing Tone and Categorizing Text in Media News at an Electronic Business Based on Sentiment and Sarcasm Analysis", International Journal of Information Engineering and Electronic Business, Vol.17, No.1, pp. 90-139, 2025.

International Journal of Information Engineering and Electronic Business (IJIEEB)