Agile Methodology of Information Engineering for Semantic Annotations Categorization and Creation in Scientific Articles Based on NLP and Machine Learning Methods

By Danylo Levkivskyi Victoria Vysotska Lyubomyr Chyrun Yuriy Ushenko Dmytro Uhryn Cennuo Hu

DOI: https://doi.org/10.5815/ijieeb.2025.02.01, Pub. Date: 8 Apr. 2025

Research devoted to the categorization and creation of semantic annotations for scientific articles stands out as an essential direction of development in the context of the growing volume of scientific literature. The application of machine learning and natural language processing in this field allows you to effectively organize and provide access to scientific information. The article discusses methods of automatic annotation of texts. Based on the review, the use of the constraint propagation model is proposed to improve the technique of text relationship maps. The developed software system is aimed at automating the process of analysis and categorization of scientific materials, which opens the way to improving the speed and accuracy of searching for the necessary information for researchers. The use of advanced machine learning models, such as roBERTa and RAG, ensures the highest quality of data processing and creation of semantic annotations. The accuracy of predicting article categories after improving the model reached 88%. The novelty of the approach is the combination of categorization and semantic annotation to increase the convenience and speed of searching for scientific information. The software system opens up opportunities for future expansion and improvement through the use of advanced technologies and machine learning models. This study is noted for its relevance, originality of approach and potential for practical application in the field of scientific research and development of science as a whole. The proposed approach contributes to the development of the Information Engineering and Electronic Business industry through the following key aspects: automation of categorization and annotation of scientific articles, improving the accuracy of information search, increasing the efficiency of scientific research, and the flexibility and scalability of the solution.

[...] Read more.

Intelligent Application for Textual Content Authorship Identification based on Machine Learning and Sentiment Analysis

By Dmytro Uhryn Victoria Vysotska Lyubomyr Chyrun Sofia Chyrun Cennuo Hu Yuriy Ushenko

DOI: https://doi.org/10.5815/ijisa.2025.02.05, Pub. Date: 8 Apr. 2025

During the development and implementation of the software system for text analysis, attention was focused on the morphological, syntactic and stylistic levels of the language, which made it possible to develop detailed profiles of authorship for various writers. The main goal of the system is to automate the process of identifying authorship and detecting plagiarism, which ensures the protection of intellectual property and contributes to the preservation of cultural heritage. The scientific novelty of the research was manifested in the development of specific algorithms adapted to the peculiarities of the natural language, as well as in the use of advanced technologies, such as deep learning and big data. The introduction of the interdisciplinary approach, which combines computer science, linguistics, and literary studies, has opened up new perspectives for the detailed analysis of scholarly works. The results of the work confirm the high efficiency and accuracy of the system in authorship identification, which can serve as an essential tool for scientists, publishers, and law enforcement agencies. In addition to technical aspects, it is vital to take into account ethical issues related to confidentiality and copyright protection, which puts under control not only the technological side of the process but also moral and legal norms. Thus, the work revealed the importance and potential of using modern text processing methods for improving literary analysis and protecting cultural heritage, which makes it significant for further research and practical use in this area.

[...] Read more.

Agile Intelligent Software Solution for Textual Content Authorship Identification Based on NLP, Artificial Intelligence and Machine Learning

By Zhengbing Hu Victoria Vysotska Lyubomyr Chyrun Roman Romanchuk Yuriy Ushenko Dmytro Uhryn Cennuo Hu

DOI: https://doi.org/10.5815/ijmecs.2025.02.02, Pub. Date: 8 Apr. 2025

The main goal of the work is to create an intelligent system that uses NLP methods and machine learning algorithms to analyse and classify textual content authorship. The following machine learning models for English and Ukrainian publications were tested and trained on the dataset: Support Vector Classifier, Random Forest, Naive Bayes, Logistic Regression and Neuron Networks. For English, the accuracy of the models was higher due to the more significant amount of text data available. The results for English fiction publication show that the Neuron Networks classifier outperforms the other models in all evaluated metrics, achieving the highest accuracy (0.97), recall (0.96), F1 score (0.98), and precision (0.96). It shows that Neuron Networks is particularly effective in capturing distinctive features of the writing styles of different English authors in scientific and technical texts. For the Ukrainian language, there is a drop in accuracy by 5-10% due to the smaller number of corpora of texts for teaching. The results for scientific and technical Ukrainian publications show that the Random Forest classifier outperforms the other models in all evaluated metrics, achieving the highest accuracy (0.88), recall (0.87), F1 score (0.87), and precision (0.87). It shows that Random Forest is particularly effective in capturing distinctive features of the writing styles of different Ukrainian authors in scientific and technical texts. Much worse accuracy results were shown by other models such as Support Vector Classifier (77%), Logistic Regression (73%) and Naive Bayes (70%). The results for the Ukrainian fiction publication show that the Random Forest classifier outperforms the other models in all evaluated metrics, achieving the highest accuracy (0.85), recall (0.84), F1 score (0.84), and precision (0.84). Much worse accuracy results were shown by other models such as Support Vector Classifier (77%), Logistic Regression (73%) and Naive Bayes (70%)

[...] Read more.

Method of Performing Operations on the Elements of GF(2m) Using a Sparse Table

By Ivan Dychka Mykola Onai Andrii Severin Cennuo Hu

DOI: https://doi.org/10.5815/ijcnis.2024.01.05, Pub. Date: 8 Feb. 2024

For the implementation of error-correcting codes, cryptographic algorithms, and the construction of homomorphic methods for privacy-preserving, there is a need for methods of performing operations on elements GF(2m) that have low computational complexity. This paper analyzes the existing methods of performing operations on the elements GF(2m) and proposes a new method based on the use of a sparse table of elements of this field. The object of research is the processes of operations in information security systems. The subject of research is methods and algorithms for performing operations on elements GF(2m). The purpose of this research is to develop and improve methods and algorithms for performing operations on elements GF(2m) to reduce their computational complexity. Empirical methods and methods of mathematical and software modeling are used in the research. Existing and proposed algorithms are implemented using the C# programming language in the Visual Studio 2015 development environment. Experimental research of existing and developed algorithms was carried out according to the proposed method, which allows to level the influence of additional parameters on the results of the research. The conducted research on methods for performing operations on the elements GF(2m) shows the expediency of using a sparse table of field elements. This approach makes it possible to reduce the amount of RAM required for the software and hardware implementation of the developed method compared to the classical tabular method, which requires storage of a full table of correspondence of the polynomial and index representation of the field elements. In addition, the proposed method gives an increase in speed of more than 4 times for the operations of calculating the multiplicative inverse element and exponentiation. As a result, the proposed method allows to reduce the computational complexity of error-correcting codes, cryptographic algorithms, and the homomorphic methods for privacy-preserving.

[...] Read more.

Information Technology for Generating Lyrics for Song Extensions Based on Transformers

By Oleksandr Mediakov Victoria Vysotska Dmytro Uhryn Yuriy Ushenko Cennuo Hu

DOI: https://doi.org/10.5815/ijmecs.2024.01.03, Pub. Date: 8 Feb. 2024

The article develops technology for generating song lyrics extensions using large language models, in particular the T5 model, to speed up, supplement, and increase the flexibility of the process of writing lyrics to songs with/without taking into account the style of a particular author. To create the data, 10 different artists were selected, and then their lyrics were selected. A total of 626 unique songs were obtained. After splitting each song into several pairs of input-output tapes, 1874 training instances and 465 test instances were obtained. Two language models, NSA and SA, were retrained for the task of generating song lyrics. For both models, t5-base was chosen as the base model. This version of T5 contains 223 million parameters. The analysis of the original data showed that the NSA model has less degraded results, and for the SA model, it is necessary to balance the amount of text for each author. Several text metrics such as BLEU, RougeL, and RougeN were calculated to quantitatively compare the results of the models and generation strategies. The value of the BLEU metric is the most diverse, and its value varies significantly depending on the strategy. At the same time, Rouge metrics have less variability and a smaller range of values. In total, for comparison, we used 8 different decoding methods for text generation supported by the transformers library, including Greedy search, Beam search, Diverse beam search, Multinomial sampling, Beam-search multinomial sampling, Top-k sampling, Top-p sampling, and Contrastive search. All the results of the lyrics comparison show that the best method for generating lyrics is beam search and its variations, including ray sampling. The contrastive search usually outperformed the usual greedy approach. The top-p and top-k methods do not have a clear advantage over each other, and in different situations, they produced different results.

[...] Read more.

MECS Press Menu

Cennuo Hu

Author Articles

Agile Methodology of Information Engineering for Semantic Annotations Categorization and Creation in Scientific Articles Based on NLP and Machine Learning Methods

Intelligent Application for Textual Content Authorship Identification based on Machine Learning and Sentiment Analysis

Agile Intelligent Software Solution for Textual Content Authorship Identification Based on NLP, Artificial Intelligence and Machine Learning

Method of Performing Operations on the Elements of GF(2m) Using a Sparse Table

Information Technology for Generating Lyrics for Song Extensions Based on Transformers

Other Articles