Information Engineering for Fake Job Postings Classification in Electronic Business Based on Machine Learning Technology

By Markiian-Mykhailo Paprotskyi Victoria Vysotska Lyubomyr Chyrun Yuriy Ushenko Zhengbing Hu Dmytro Uhryn

DOI: https://doi.org/10.5815/ijieeb.2025.05.07, Pub. Date: 8 Oct. 2025

This study investigates the application of machine learning methods for the classification of fraudulent job postings in e-business platforms. Using the publicly available fake_job_postings.csv dataset, textual and categorical features of vacancies were processed and vectorised through TF-IDF, HashingVectorizer, and optimised TF-IDF. Eight machine learning algorithms were compared, including Logistic Regression, Random Forest, Gradient Boosting, Decision Tree, Multinomial Naive Bayes, Linear SVC, K-Nearest Neighbours, and XGBoost. The experiments demonstrate that XGBoost achieved the best performance (Accuracy = 0.990, Precision = 0.982, Recall = 0.998, F1 = 0.990) across all feature representations. Its superior results can be attributed to the ability of boosted ensembles to capture complex non-linear relationships in high-dimensional feature spaces while maintaining robustness against noise and class imbalance.
However, it should be noted that the evaluation was performed on a single static dataset. While the high recall shows the model’s ability to reliably detect fraudulent ads in this context, questions remain about its generalisability. Fraud tactics evolve rapidly, and new job scams may significantly differ from patterns in the training data. This creates a potential risk of overfitting to dataset-specific features, which limits direct transfer to real-world scenarios without continuous retraining and monitoring. The practical contribution of the study is a reproducible framework that integrates text and categorical processing, vectorisation, hyperparameter optimisation, and comparative model benchmarking. Such a framework could be embedded into online job platforms to support automated filtering of suspicious ads. Still, its deployment requires additional measures: periodic retraining with updated data, integration with platform APIs, and the inclusion of explainability modules to ensure transparency and user trust. Overall, the research demonstrates that ensemble-based models, particularly XGBoost, offer strong potential for fraud detection in the e-business labour market. At the same time, further work is necessary to validate model robustness on unseen and evolving fraudulent job posting strategies, ensuring scalability and reliability in production environments.

[...] Read more.

Sentiment Analysing and Visualising Public Opinion on Political Figures across YouTube and Twitter Using NLP and Machine Learning

By Victoria Vysotska Alina Starchenko Lyubomyr Chyrun Zhengbing Hu Yuriy Ushenko Dmytro Uhryn

DOI: https://doi.org/10.5815/ijigsp.2025.05.08, Pub. Date: 8 Oct. 2025

The study is devoted to the analysis of public sentiment towards Ukrainian political figures based on comments on social media, in particular, YouTube and Twitter. The work aims to identify differences in the perception of political leaders and to understand how the platform affects the tone of statements. The main research question is to determine how public opinion about politicians in Ukraine differs between YouTube and Twitter during the full-scale war. To do this, a corpus of comments and tweets from 2022 to 2023 was collected, which went through pre-processing stages (including cleaning up slang and spelling mistakes). The article presents the results of a comprehensive analysis of public opinion on five public figures of Ukraine (S. Prytula, P. Poroshenko, V. Zelensky, S. Sternenko, A. Yermak) based on data from the social networks YouTube and Twitter. For data collection, the YouTube Data API and the Apify platform were used, a corpus of Ukrainian-language comments and tweets was collected and processed, which went through the stages of purification, normalisation and lemmatisation, taking into account slang, surzhyk and spelling mistakes. The sentiment analysis model, built on the basis of multilingual-e5-base embeddings and the XGBClassifier algorithm, showed an accuracy of 89.4%, macro-F1 of 88.7%, and a weighted F1 of 89.1%. Sentiment distribution analysis revealed that, on average, 42% of messages were positive, 36% were negative, and 22% were neutral. Twitter had a higher share of negative statements (up to 40%), while YouTube had a predominance of positive sentiment (up to 47%). The results indicate differences in the perception of public figures on different platforms and confirm the effectiveness of the developed approach for the Ukrainian-speaking segment of social networks. The results indicate significant differences in sentiment distribution: comments on YouTube are more likely to be marked by emotional intensity and harshness. At the same time, Twitter exhibits a more concise but no less polarised discourse. One of the reasons for this difference may be the difference in the format of the platforms, their audience, and the speed of content distribution. Further research should take into account the impact of user demographic biases, as well as the activity of bots or coordinated campaigns that can change the perception of public opinion. The practical significance of the study lies in the fact that its results can be used by politicians, journalists, and public figures to better understand the mood of society, predict reactions to political events, and build more effective communication. At the same time, it is worth noting that there are limitations: automated sentiment analysis has difficulty detecting sarcasm, irony, or context-sensitive meanings, which can affect the Accuracy of the results. In addition, the study takes into account the ethical aspects of data collection and analysis: only publicly available comments were used, without interference in the private sphere of users. There are possible risks of abuse of such technologies, and the need for responsible application of the findings is emphasised.

[...] Read more.

Local Agentic RAG-Based Information System Development for Intelligent Analysis of GitHub Code Repositories in Computer Science Education

By Zhengbing Hu Markiian-Mykhailo Paprotskyi Victoria Vysotska Lyubomyr Chyrun Yuriy Ushenko Dmytro Uhryn

DOI: https://doi.org/10.5815/ijmecs.2025.05.07, Pub. Date: 8 Oct. 2025

This study presents the development and evaluation of a local agent-based Retrieval-Augmented Generation (Agentic RAG) system designed for the intelligent analysis of GitHub repositories in computer science education and IT practice. The novelty of this work lies not in inventing a new RAG algorithm, but in orchestrating multiple existing components (LangChain, Redis, SentenceTransformer, and LLMs) into a multi-stage agent pipeline with integrated relevance evaluation, specifically adapted to offline repository mining. The proposed pipeline consists of four sequential stages: (1) query reformulation by a dedicated LLM agent, (2) semantic retrieval using SentenceTransformer embeddings stored in Redis, (3) response generation by a second LLM, and (4) relevance scoring through a verification agent with retry logic. Relevance is assessed via cosine similarity and LLM-based scoring, allowing iterative refinement of answers. Experimental testing compared the system against two baselines: keyword search and a non-agentic single-stage RAG pipeline. Results showed an average MRR@10 of 0.72, compared to 0.48 for keyword search and 0.61 for non-agentic RAG, representing a 33% relative improvement in retrieval quality. Human evaluators (n=15, computer science students) rated generated explanations on a 5-point Likert scale; the proposed system achieved an average 4.3/5 for clarity and correctness, compared to 3.6/5 for the baseline. Precision@5 for code retrieval improved from 0.54 (keyword) and 0.67 (non-agentic RAG) to 0.76 in the proposed system. Average query latency in the local environment was 3.8 seconds, indicating acceptable performance for educational and small-team IT use cases. The system demonstrates high autonomy by operating fully on-premises with only optional API access to LLMs, ensuring privacy and independence from cloud providers. Ease of use was measured through a System Usability Scale (SUS) questionnaire, yielding a score of 78/100, reflecting positive user perception of the Streamlit interface and minimal setup requirements. Nevertheless, several limitations were observed: the high computational cost of running embeddings and LLMs locally, potential hallucinations in generated explanations (particularly for complex or unfamiliar code), and the inability of vector search to fully capture code syntax and control flow structures. Furthermore, while the Analytic Hierarchy Process (AHP) was applied to select the system architecture, future work should complement this with benchmark-driven evaluations for greater objectivity. The contribution of this study is threefold: (1) introducing a multi-agent orchestration logic tailored to educational code repositories; (2) empirically demonstrating measurable gains in retrieval quality and explanation usefulness over baselines; and (3) highlighting both opportunities and limitations of deploying autonomous RAG systems locally. The proposed technology can benefit IT companies seeking secure in-house tools for repository analysis, universities aiming to integrate intelligent assistants into programming courses, and research institutions requiring reproducible, privacy-preserving environments for code exploration.

[...] Read more.

Smart Tool for Identifying Misinformation Spread Sources and Routes in Social Networks Based on NLP and Machine Learning

By Victoria Vysotska Sofiia Popp Viktoriia Bulatova Zhengbing Hu Yuriy Ushenko Dmytro Uhryn

DOI: https://doi.org/10.5815/ijcnis.2025.05.08, Pub. Date: 8 Oct. 2025

This article presents a method for detecting disinformation in news texts based on a combination of classic machine learning algorithms and deep learning models. The proposed approach was tested on the corpus of Ukrainian- and English-language news with the "fake/truth" classes marked. Before modelling, detailed data pre-processing was performed: deletion of duplicates, cleaning of HTML tags, links and special characters, normalisation of texts, unification of labels, class balancing, and tokenisation. A hybrid approach was used for vectorisation: frequency features (TF-IDF) were combined with contextual vector representations based on the IBM Granite multilingual model. Logistic regression is chosen as a classifier, which allows a balance to be achieved between quality and interpretation of results. Standard metrics are used to assess performance, such as Accuracy, Precision, Recall, F1-score, and ROC-AUC. According to the results of experiments, the model showed an Accuracy in the range of 0.91–0.93, a Precision of 0.89, a Recall of 0.92, an F1-score of 0.90, as well as an ROC-AUC over 0.94. The obtained values demonstrate the balanced ability of the system not only to accurately classify news, but also to minimise false positives, which is especially important in the conditions of information warfare. Priority is given to Recall's high scores, as the omission of fake messages can have critical consequences for information security. Thus, the proposed approach makes a scientific contribution to the field of automated disinformation detection by combining transparent and reproducible data processing with a hybrid text representation. The uniqueness of the study lies in the adaptation of NLP and machine learning methods to the Ukrainian-language information space and the context of modern hybrid warfare, which allows you to effectively identify the sources and routes of spreading fake news.

[...] Read more.

Smart Application for Recruiting Based on Natural Language Processing Methods, Transformer Models and Siamese Neural Network Architecture

By Diana Vyshotravka Victoria Vysotska Zhengbing Hu Dmytro Uhryn Yuriy Ushenko Kyrylo Smelyakov

DOI: https://doi.org/10.5815/ijisa.2025.05.07, Pub. Date: 8 Oct. 2025

This study presents a deep learning-based approach to automated resume and job matching that uses semantic similarity between texts. The solution is based on SimCSE RoBERTa transformer embeddings and a Siamese neural architecture trained using the MSELoss loss function. Unlike traditional filtering systems by keywords or characteristics, the proposed model learns to place semantically compatible pairs (resume-vacancy) in a common vector space. Unlike traditional keyword-based or attributive matching systems, our method is designed to capture deep semantic alignment between resumes and job descriptions. To evaluate the effectiveness of this architecture, we conducted extensive experiments on a labelled dataset of over 7,000 resume–vacancy pairs obtained from the HuggingFace repository. The dataset includes three classes (Good Fit, Potential Fit, No Fit), which we restructured into a binary classification task. Annotation labels reflect textual compatibility based on skills, responsibilities, and experience, ensuring task relevance.
It resulted in a moderately imbalanced dataset with approximately 66% positive and 34% negative examples. Labels were assigned based on semantic compatibility, including skill match, job responsibilities, and experience alignment. Our model achieved accuracy = 72%, precision = 70%, recall = 74%, F1-score = 72%, and Precision@10 = 75%, significantly outperforming both classical (TF-IDF + cosine similarity) and neural (Sentence-BERT without fine-tuning) baselines. These results validate the empirical effectiveness of our architecture for candidate ranking and selection. To justify the use of a complex Siamese architecture, the system was compared to two baselines: (1) a classical TF-IDF + cosine similarity method, and (2) a pretrained Sentence-BERT model without task-specific fine-tuning. The proposed model significantly outperformed both baselines across all evaluation metrics, confirming that its complexity translates to meaningful performance gains. A basic self-learning mechanism is implemented and functional. Recruiters can provide binary feedback (Fit / No Fit) for each recommended candidate, which is stored in a feedback table. This feedback can be used to retrain or fine-tune the model periodically, enabling adaptive behaviour over time. While initial retraining experiments were conducted offline, full automation and continuous integration of feedback into training pipelines remain a goal for future development. The system offers sub-5-second response times, integration with vector databases, and a web-based user interface. It is designed for use in HR departments, recruiting agencies, and employment platforms, with potential for broader commercial deployment and domain adaptation. We additionally implemented a feedback-driven retraining loop that enables future self-supervised adaptation. While UI and vector retrieval infrastructure were developed to support prototyping and deployment, the primary research innovation centres on the modelling framework, learning setup, and comparative evaluation methodology. This work contributes to the advancement of semantically-aware intelligent recruiting systems and offers a replicable baseline for future studies in neural recommendation for HR applications. The risks of algorithmic bias are emphasised separately: even in the absence of obvious demographic characteristics in the input data, the model can implicitly reproduce social or historical inequalities inherent in the data. In this regard, the study outlines areas for further development, in particular equity auditing, bias reduction techniques, and the integration of human validation in decision-making.

[...] Read more.

Information Technology for Modelling Social Trends in Telegram Using E5 Vectors and Hybrid Cluster Analysis

By Roman Lynnyk Victoria Vysotska Zhengbing Hu Dmytro Uhryn Liliia Diachenko Kyrylo Smelyakov

DOI: https://doi.org/10.5815/ijitcs.2025.04.07, Pub. Date: 8 Aug. 2025

The article presents a modern approach to analysing public opinion based on Ukrainian-language content from Telegram channels. This study presents a hybrid clustering approach that combines DBSCAN and K-means algorithms to analyse vectorised Ukrainian-language social media posts in order to detect public opinion trends. The methodology relies on a multilingual neural network–based text vectorisation model, which enables effective representation of the semantic content of posts. Experiments conducted on a corpus of 90 Ukrainian-language messages (collected between March and May 2025) allowed for the identification of six principal thematic clusters reflecting key areas of public discourse. Despite the small volume of the corpus (90 messages), the sample is structured and balanced by topic (news, vacancies, gaming), which allows you to test the effectiveness of the proposed methodology in conditions of limited data. This approach is appropriate in the case of the analysis of short texts in low-resource languages, where large-scale corpora are not available. A special advantage of this approach is the use of semantic vector representation and the construction of graphs of lexical co-occurrence networks (term co-occurrence networks), which demonstrate a stable topological structure even with small amounts of data. It allows you to identify latent topic patterns and coherent clusters that have the potential to scale to broader corpora. The authors acknowledge the limitations associated with sample size, but emphasise the role of this study as a pilot stage for the development of a universal, linguistically adaptive method for analysing public discourse. In the future, it is planned to expand the body to increase the representativeness and accuracy of the conclusions. The paper proposes a hybrid method of automatic thematic cluster analysis of short texts in social media, in particular Telegram. Vectorisation of Ukrainian-language messages is implemented using the transformer model multilingual-e5-large-instruct. A combination of HDBSCAN and K-means algorithms was used to detect clusters. More than 36,000 messages from three Telegram channels (news, games, vacancies) were analysed, and six main thematic clusters were identified. To identify thematic trends, a hybrid clustering approach was used, in which the HDBSCAN algorithm was used at the first stage to identify dense clusters and identify "noise" points, after which K-means were used to reclassify residual ("noise") embeddings to the nearest cluster centres.
Such a two-tier strategy allows you to combine the advantages of flexible allocation of free-form clusters from HDBSCAN and stable classification of less pronounced groups through K-means. It is especially effective when working with fragmented short texts of social networks. To validate the quality of clustering, both visualisation tools (PCA, t-SNE, word clouds) and quantitative metrics were used: Silhouette Score (0.41) and Davis-Boldin index (0.78), which indicate moderate coherence and resolution of clusters. Separately, the high level of "noise" in HDBSCAN (34.2%) was analysed, which may be due to the specifics of short texts, model parameters, or stylistic fragmentation of Telegram messages. The results obtained show the effectiveness of combining modern vectorisation models with flexible clustering methods to identify structured topics in fragmented Ukrainian-language content of social networks. The proposed approach has the potential to further expand to other sources, types of discourse, and tasks of digital sociology. As a result of processing 90 messages received from three different channels (news, gaming content, and vacancies), six main thematic clusters were identified. The largest share is occupied by clusters related to employment (28.2%) and security-patriotic topics (24.7%). The average level of "noise" after the initial HDBSCAN clustering was 34.2%. Additional analysis revealed that post lengths varied significantly, ranging from short announcements (average of 10 words) to analytical texts (over 140 words). Visualisations (timelines, PCA, t-SNE, word clouds, term co-occurrence graphs) confirm the thematic coherence of clusters and reveal changes in thematic priorities over time. The proposed system is an effective tool for detecting information trends in the environment of short, fragmented texts and can be used to monitor public sentiment in low-resource languages.

[...] Read more.

Smart Tool for Text Content Analysis to Identify Key Propaganda Narratives and Disinformation in News Based on NLP and Machine Learning

By Maryna Nyzova Victoria Vysotska Lyubomyr Chyrun Zhengbing Hu Yuriy Ushenko Dmytro Uhryn

DOI: https://doi.org/10.5815/ijcnis.2025.04.08, Pub. Date: 8 Aug. 2025

The paper presents the development of a smart tool for automated analysis of news text content in order to identify propaganda narratives and disinformation. The relevance of the project is due to the growth of the information threat in the context of a hybrid war, in particular in the Ukrainian information space. The proposed solution is implemented in the form of a browser plugin that provides instant analysis of content without the need to switch to third-party services. The methodology is based on the use of modern natural language processing (NLP) and deep learning methods (in particular, BERT models) to classify content according to the level of propaganda impact and identify key narratives. As part of the study, modern models of transformers for text analysis, in particular BERT, were used. For the task of classifying propaganda, pre-trained GloVe vectors optimised for news articles were used, which provided the best results among the options considered. Instead, the BERT model was used to classify narratives, which showed higher accuracy in the processing of texts reflecting subjective thoughts. The adaptation included the use of a multilingual version of BERT (multilingual BERT), as it allows you to effectively work with Ukrainian-language data, which is a key advantage for localised analysis in the context of information warfare. Before using BERT, pre-processing of texts was carried out with the addition of syntactic, punctuation, emotional and stylistic features, which increased the accuracy of classification. For a more complete and reliable assessment of the effectiveness of propaganda classification models and narratives, a set of key metrics was used for propaganda/ narratives analyses Accuracy (0.94/0.86), Precision (0.95/0.69), Recall (0.96/0.71) and F1-score (0.96/0.70).The developed model showed high accuracy results: the F1-score for the propaganda classification problem was 0.96 and for the narrative classification problem – 0.70, which significantly exceeds the results of similar approaches, in particular XGBoost (0.92 and 0.50, respectively). In addition, the system supports full-fledged work with Ukrainian-language content, which is its key competitive advantage. The practical application of the tool covers journalism, fact-checking, analytics, and improving media literacy among citizens, contributing to the improvement of the state's information security.

[...] Read more.

Application of Tensor Networks Analysis to Optimize Traffic Management in a Critical Information and Telecommunications Network

By Oleksandr Lavrut Tetiana Lavrut Victoria Vysotska Zhengbing Hu Yuriy Ushenko Dmytro Uhryn

DOI: https://doi.org/10.5815/ijigsp.2025.04.08, Pub. Date: 8 Aug. 2025

The article investigates the task of optimising traffic management in critical information and telecommunication networks in order to ensure a guaranteed quality of user service, particularly in emergencies. A method of tensor analysis of networks is proposed, using a formalised description of the system in the form of tensors of message lengths, delays and bandwidth of channels. The network is modelled as a simplified complex, and routing is implemented through a tensor equation of connection between network parameters in different coordinate systems. Experimental calculations using examples with dynamically variable topology have shown:

•Reduction of average multipath message delivery latency by 9–40% depending on traffic intensity,
•Probability of packet delivery at or above 0.999 under high loads (200-300 messages/s),
•Zero jitter due to the even distribution of delays between paths,
•The ability to adaptively fragment messages in nodes to reduce latency,
•Increasing the efficiency of resource use compared to single-track models.

The use of a tensor apparatus provides stable and scalable routing in an unstable network topology. The method allows you to take into account the heterogeneity of traffic, adapt to the loss of nodes or channels, and maintain guarantees of quality of service in real time. The proposed approach is of practical importance for information and telecommunication systems used in emergencies, in particular for coordinating the actions of emergency rescue services, emergency medicine, civil protection, military units, control of drones and robotic means in the face of infrastructure loss. Potential stakeholders include state and municipal security services, operators of critical networks (energy, transport, healthcare), developers of automated control systems, and manufacturers of secure communication equipment. The proposed method can be integrated into decentralised networks with limited resources and variable topology, where traditional routing approaches do not guarantee sufficient quality of service.

[...] Read more.

Comprehensive Intellectual Analysis of Statistical Data on Leading Energy Companies’ Actions

By Viktoriia Bulatova Sofiia Popp Victoria Vysotska Yuriy Ushenko Zhengbing Hu Dmytro Uhryn

DOI: https://doi.org/10.5815/ijieeb.2025.04.07, Pub. Date: 8 Aug. 2025

The paper conducted a comprehensive analysis of the time series of stock prices of three leading energy companies – Shell, BP and ExxonMobil – for the period from January 2021 to January 2025. At the initial stage, data quality was checked: dates were set as indices, the absence of duplicates and missing values was confirmed, and descriptive statistics (mean, variance, skewness and kurtosis) were calculated. Next, the trends of adjusted closing prices (AdjClose) were analysed using moving averages (SMA14, SMA50), exponential smoothing, moving volatility (30-day standard deviation) and cumulative returns. It was found that еhe price dynamics growth has accelerated since 2022 against the background of the energy crisis caused by the war in Ukraine: ExxonMobil’s cumulative return reached ≈250% by mid-2022 and ≈350% at the beginning of 2025, Shell and BP, respectively ≈220% and ≈200% by 2024. Correlation analysis showed that BP and Shell have the most significant interdependence (r = 0.87, R² = 0.75). The autocorrelation method established high non-stationarity of the time series (ACF about one at low lags). K-Means clustering (k = 2) allowed us to distinguish periods of active growth and relative price consolidation, although the feature selection behind this clustering requires further clarification. The initially reported financial metrics (Sharpe, Sortino, and Calmar ratios) were significantly overstated due to unit errors, specifically, using percentage values as absolute figures. After applying appropriate annualization and decimal scaling performance indicators were obtained for ExxonMobil – CAGR = 36.84%, Sharpe ≈ 1.24, Sortino ≈ 1.9–2.5, Max Drawdown = 20.51%, Calmar ≈ 1.80; Shell: CAGR = 21.29%, Sharpe ≈ 0.76, Sortino ≈ 1.2–1.5, Max Drawdown = 25.04%, Calmar ≈ 0.85; BP: CAGR = 14.54%, Sharpe ≈ 0.53, Sortino ≈ 0.9–1.2, Max Drawdown = 26.23%, Calmar ≈ 0.55. The study confirms that ExxonMobil showed the most stable and substantial growth during the examined period, while BP exhibited the highest volatility. Shell demonstrated an intermediate performance level. The close correlation between Shell and BP is attributed to the similarity in their geographical market activity and stock behaviour. The choice of these methods of analysis is due to the desire to assess the behaviour of stocks during the period of increased market volatility caused by the energy crisis, geopolitical risks and changes in investor priorities. Technical analysis allows you to identify short- and medium-term patterns, clustering allows you to automatically separate market phases without the need for subjective hypotheses, and statistical metrics will enable you to compare the performance of assets within the industry. This research contributes to the broader field of financial analysis by demonstrating how machine learning and technical analytics tools can be applied to assess the resilience and relationships of assets during periods of market turmoil. The results can be helpful for institutional investors, financial analysts, and portfolio managers looking to adapt strategies to dynamic energy market conditions.

[...] Read more.

Development and Testing of Voice User Interfaces Based on BERT Models for Speech Recognition in Distance Learning and Smart Home Systems

By Victoria Vysotska Zhengbing Hu Nikita Mykytyn Olena Nagachevska Kateryna Hazdiuk Dmytro Uhryn

DOI: https://doi.org/10.5815/ijcnis.2025.03.07, Pub. Date: 8 Jun. 2025

Voice User Interfaces (VUIs) focus on their application in IT and linguistics. Our research examines the capabilities and limitations of small and multilingual BERT models in the context of speech recognition and command conversion. We evaluate the performance of these models through a series of experiments, including the application of confusion matrices to assess their effectiveness. The findings reveal that larger models like multilingual BERT theoretically offer advanced capabilities but often demand more substantial resources and well-balanced datasets. Conversely, smaller models, though less resource-intensive, may sometimes provide more practical solutions. Our study underscores the importance of dataset quality, model fine-tuning, and efficient resource management in optimising VUIS. Insights gained from this research highlight the potential of neural networks to enhance and improve user interaction. Despite challenges in achieving a fully functional interface, the study provides valuable contributions to the VUIs development and sets the stage for future advancements in integrating AI with linguistic technologies. The article describes the development of a voice user interface (VUI) capable of recognising, analysing, and interpreting the Ukrainian language. For this purpose, several neural network architectures were used, including the Squeezeformer-CTC model, as well as a modified w2v-bert-2.0-uk model, which was used to decode speech commands into text. The multilingual BERT model (mBERT) for the classification of intentions was also tested. The developed system showed the prospects of using BERT models in combination with lightweight ASR architectures to create an effective voice interface in Ukrainian. Accuracy indicators (F1 = 91.5%, WER = 12.7%) indicate high-quality recognition, which is provided even in models with low memory capacity. The system is adaptable to conditions with limited resources, particularly for educational and living environments with a Ukrainian-speaking audience.

[...] Read more.

Intelligent Application for Predicting Diabetes Spread Risk in the World Based on Machine Learning

By Dmytro Uhryn Victoria Vysotska Daryna Zadorozhna Mariia Spodaryk Kateryna Hazdiuk Zhengbing Hu

DOI: https://doi.org/10.5815/ijisa.2025.03.06, Pub. Date: 8 Jun. 2025

This paper presents the development and implementation of an intelligent system for predicting the risk of diabetes spread using machine learning techniques. The core of the system relies on the analysis of the Pima Indians Diabetes dataset through k-nearest neighbours (k-NN), Random Forest, Logistic Regression, Decision Trees and XGBoost algorithms. After pre-processing the data, including normalization and handling missing values, the k-NN model achieved an accuracy of 77.2%, precision of 80.0%, recall of 85.0%, F1-score of 83.0% and ROC of 81.9%. The Random Forest model achieved an accuracy of 81.0%, precision of 87.0%, recall of 91.0%, F1-score of 89.0% and ROC of 90.0%. The Logistic Regression model achieved an accuracy of 60.0%, precision of 93.0%, recall of 61.0%, F1-score of 74.0% and ROC of 69.0%. The Decision Trees model achieved an accuracy of 79.0%, precision of 87.0%, recall of 89.0%, F1-score of 88.0% and ROC of 83.0%. In comparison, the XGBoost model outperformed with an accuracy of 83.0%, precision of 85.0%, recall of 96.0%, F1-score of 90.0% and ROC of 91.0%, indicating strong prediction capabilities. The proposed system integrates both hardware (continuous glucose monitors) and software (AI-based classifiers) components, ensuring real-time blood glucose level tracking and early-stage diabetes risk prediction. The novelty lies in the proposed architecture of a distributed intelligent monitoring system and the use of ensemble learning for risk assessment. The results demonstrate the system's potential for proactive healthcare delivery and patient-centred diabetes management.

[...] Read more.

Agile Technology of Information Data Engineering for Intelligent Analysis of the Happiness Index and Life Satisfaction in Known World Cities

By Yuriy Ushenko Victoria Vysotska Daryna Zadorozhna Mariia Spodaryk Zhengbing Hu Dmytro Uhryn

DOI: https://doi.org/10.5815/ijieeb.2025.03.07, Pub. Date: 8 Jun. 2025

This paper presents the development of an intelligent information system for analysing the happiness index and life satisfaction based on sociological survey data from various countries. The research addresses the need to improve the accuracy and efficiency of social research by integrating data mining and machine learning methods – specifically K-means clustering and multiple regression analysis – into the system design. The proposed module enables automated classification of countries and cities by life satisfaction levels, allowing stakeholders to make informed decisions on urban planning and social policy. The system also facilitates the identification of favourable living environments, providing valuable insights into the social, economic, and environmental factors affecting well-being. The experimental results on real-world datasets confirm the module’s effectiveness and predictive capabilities.

[...] Read more.

Agile Methodology of Information Engineering for Semantic Annotations Categorization and Creation in Scientific Articles Based on NLP and Machine Learning Methods

By Danylo Levkivskyi Victoria Vysotska Lyubomyr Chyrun Yuriy Ushenko Dmytro Uhryn Cennuo Hu

DOI: https://doi.org/10.5815/ijieeb.2025.02.01, Pub. Date: 8 Apr. 2025

Research devoted to the categorization and creation of semantic annotations for scientific articles stands out as an essential direction of development in the context of the growing volume of scientific literature. The application of machine learning and natural language processing in this field allows you to effectively organize and provide access to scientific information. The article discusses methods of automatic annotation of texts. Based on the review, the use of the constraint propagation model is proposed to improve the technique of text relationship maps. The developed software system is aimed at automating the process of analysis and categorization of scientific materials, which opens the way to improving the speed and accuracy of searching for the necessary information for researchers. The use of advanced machine learning models, such as roBERTa and RAG, ensures the highest quality of data processing and creation of semantic annotations. The accuracy of predicting article categories after improving the model reached 88%. The novelty of the approach is the combination of categorization and semantic annotation to increase the convenience and speed of searching for scientific information. The software system opens up opportunities for future expansion and improvement through the use of advanced technologies and machine learning models. This study is noted for its relevance, originality of approach and potential for practical application in the field of scientific research and development of science as a whole. The proposed approach contributes to the development of the Information Engineering and Electronic Business industry through the following key aspects: automation of categorization and annotation of scientific articles, improving the accuracy of information search, increasing the efficiency of scientific research, and the flexibility and scalability of the solution.

[...] Read more.

Intelligent Application for Textual Content Authorship Identification based on Machine Learning and Sentiment Analysis

By Dmytro Uhryn Victoria Vysotska Lyubomyr Chyrun Sofia Chyrun Cennuo Hu Yuriy Ushenko

DOI: https://doi.org/10.5815/ijisa.2025.02.05, Pub. Date: 8 Apr. 2025

During the development and implementation of the software system for text analysis, attention was focused on the morphological, syntactic and stylistic levels of the language, which made it possible to develop detailed profiles of authorship for various writers. The main goal of the system is to automate the process of identifying authorship and detecting plagiarism, which ensures the protection of intellectual property and contributes to the preservation of cultural heritage. The scientific novelty of the research was manifested in the development of specific algorithms adapted to the peculiarities of the natural language, as well as in the use of advanced technologies, such as deep learning and big data. The introduction of the interdisciplinary approach, which combines computer science, linguistics, and literary studies, has opened up new perspectives for the detailed analysis of scholarly works. The results of the work confirm the high efficiency and accuracy of the system in authorship identification, which can serve as an essential tool for scientists, publishers, and law enforcement agencies. In addition to technical aspects, it is vital to take into account ethical issues related to confidentiality and copyright protection, which puts under control not only the technological side of the process but also moral and legal norms. Thus, the work revealed the importance and potential of using modern text processing methods for improving literary analysis and protecting cultural heritage, which makes it significant for further research and practical use in this area.

[...] Read more.

Agile Intelligent Information Technology for Speech Synthesis Based on Transfer Function Approximation Methods Using Continued Fractions

By Lyubomyr Chyrun Victoria Vysotska Sofia Chyrun Zhengbing Hu Yuriy Ushenko Dmytro Uhryn

DOI: https://doi.org/10.5815/ijigsp.2025.02.01, Pub. Date: 8 Apr. 2025

The study considers the methodology of using continued fractions to approximate transfer functions in speech synthesis systems. The main results of the research are an increase in the accuracy of approximation, acceleration of calculations, and a new method of convergence analysis. The use of continued fractions allowed for a reduction in the error of approximation of transfer functions compared to classical methods. With an error of 1.0E-06, the continued fraction method requires only 3–13 terms, while the power series requires 3–15 terms. The use of continued fractions reduced the time for calculating transfer functions by 2–3%. It was determined that the most effective for calculating the values of continued fractions are the Δ-algorithm and the α-algorithm. A new criterion for the convergence of continued fractions is proposed, which allows the sum fractions that are "divergent" in the classical sense. The graphs used to classify different types of continued fractions allowed us to better understand their structure and potential for application in speech synthesis. Software for calculating transfer function values based on continued fraction decomposition has been developed and tested. It has allowed automation of the approximation process and increased the efficiency of speech synthesis systems. The results obtained have allowed improving the quality of synthesised speech while simultaneously reducing the complexity of calculations. Systems using continued fractions consume less memory and provide more accurate voice reproduction. In summary, the work presents a new approach to the approximation of transfer functions, which is essential for optimising speech synthesis systems.

[...] Read more.

Agile Intelligent Software Solution for Textual Content Authorship Identification Based on NLP, Artificial Intelligence and Machine Learning

By Zhengbing Hu Victoria Vysotska Lyubomyr Chyrun Roman Romanchuk Yuriy Ushenko Dmytro Uhryn Cennuo Hu

DOI: https://doi.org/10.5815/ijmecs.2025.02.02, Pub. Date: 8 Apr. 2025

The main goal of the work is to create an intelligent system that uses NLP methods and machine learning algorithms to analyse and classify textual content authorship. The following machine learning models for English and Ukrainian publications were tested and trained on the dataset: Support Vector Classifier, Random Forest, Naive Bayes, Logistic Regression and Neuron Networks. For English, the accuracy of the models was higher due to the more significant amount of text data available. The results for English fiction publication show that the Neuron Networks classifier outperforms the other models in all evaluated metrics, achieving the highest accuracy (0.97), recall (0.96), F1 score (0.98), and precision (0.96). It shows that Neuron Networks is particularly effective in capturing distinctive features of the writing styles of different English authors in scientific and technical texts. For the Ukrainian language, there is a drop in accuracy by 5-10% due to the smaller number of corpora of texts for teaching. The results for scientific and technical Ukrainian publications show that the Random Forest classifier outperforms the other models in all evaluated metrics, achieving the highest accuracy (0.88), recall (0.87), F1 score (0.87), and precision (0.87). It shows that Random Forest is particularly effective in capturing distinctive features of the writing styles of different Ukrainian authors in scientific and technical texts. Much worse accuracy results were shown by other models such as Support Vector Classifier (77%), Logistic Regression (73%) and Naive Bayes (70%). The results for the Ukrainian fiction publication show that the Random Forest classifier outperforms the other models in all evaluated metrics, achieving the highest accuracy (0.85), recall (0.84), F1 score (0.84), and precision (0.84). Much worse accuracy results were shown by other models such as Support Vector Classifier (77%), Logistic Regression (73%) and Naive Bayes (70%)

[...] Read more.

Agile Methodology for Identifying Original and Fake Printed Documents based on Secret Raster Formation

By Mariia Nazarkevych Victoria Vysotska Vasyl Lytvyn Yuriy Ushenko Dmytro Uhryn Zhengbing Hu

DOI: https://doi.org/10.5815/ijcnis.2025.02.04, Pub. Date: 8 Apr. 2025

A method of identification of original and fake prints has been developed. Security elements are printed using an offset printing method, which we will call original printing. In parallel, we will print bitmap security elements on copiers. We will call this process fake printing. Such types of rasterisation were developed to make the difference between an original print and a fake print visible to the naked eye. A method of detecting fake printing has also been developed by measuring the change in the percentage of raster dot, dot gain, trapping, optical density, ∆lab, and change in tonality. The protection of the printed document is created when the image is transformed by amplitude-modulated rasterisation based on the mathematical apparatus of Ateb-functions. During rasterisation, we create thin graphic elements that have different shapes and are calculated according to developed methods. The size of a single dot of a raster element depends on the selection of the rasterisation method and the tonal gradation value of each corresponding pixel in the image. During rasterisation, a raster structure is formed, in which the value of each raster element is related by the value of the Ateb-function with tonal gradation, as well as a change in the angle, lines and shapes of the curves of a single raster. We offer raster image printing on various paper samples that are widely used today.

[...] Read more.

Intelligent Processing Censoring Inappropriate Content in Images, News, Messages and Articles on Web Pages Based on Machine Learning

By Oleksiy Tverdokhlib Victoria Vysotska Olena Nagachevska Yuriy Ushenko Dmytro Uhryn Yurii Tomka

DOI: https://doi.org/10.5815/ijigsp.2025.01.08, Pub. Date: 8 Feb. 2025

This project aims to enhance online experiences quality by giving users greater control over the content they encounter daily. The proposed solution is particularly valuable for parents seeking to safeguard their children, educational institutions striving to foster a more conducive learning environment, and individuals prioritising ethical internet usage. It also supports users who wish to limit their exposure to misinformation, including fake news, propaganda, and disinformation. Through the implementation of a browser extension, this system will contribute to a safer internet, reducing users' vulnerability to harmful content and promoting a more positive and productive online environment. The primary objective of this work is to develop a browser extension that automatically detects and censors inappropriate text and images on web pages using artificial intelligence (AI) technologies. The extension will enable users to personalise censorship settings, including the ability to define custom prohibited words and toggle the filtering of text and images. Accuracy estimates for various classifiers such as Random Forest (0.879), Logistic Regression (0.904), Decision Tree (0.878), Naive Bayes (0.315), and KNN (0.832) were performed.

[...] Read more.

Recognizing Fakes, Propaganda and Disinformation in Ukrainian Content based on NLP and Machine-learning Technology

By Victoria Vysotska Krzysztof Przystupa Yurii Kulikov Sofiia Chyrun Yuriy Ushenko Zhengbing Hu Dmytro Uhryn

DOI: https://doi.org/10.5815/ijcnis.2025.01.08, Pub. Date: 8 Feb. 2025

The project envisages the creation of a complex system that integrates advanced technologies of machine learning and natural language processing for media content analysis. The main goal is to provide means for quick and accurate verification of information, reduce the impact of disinformation campaigns and increase media literacy of the population. Research tasks included the development of algorithms for the analysis of textual information, the creation of a database of fakes, and the development of an interface for convenient access to analytical tools. The object of the study was the process of spreading information in the media space, and the subject was methods and means for identifying disinformation. The scientific novelty of the project consists of the development of algorithms adapted to the peculiarities of the Ukrainian language, which allows for more effective work with local content and ensures higher accuracy in identifying fake news. Also, the significance of the project is enhanced by its practical value, as the developed tools can be used by government structures, media organizations, educational institutions and the public to increase the level of information security. Thus, the development of this project is of great importance for increasing Ukraine's resilience to information threats and forming an open, transparent information society.

[...] Read more.

Information Technology for Sound Analysis and Recognition in the Metropolis based on Machine Learning Methods

By Lyubomyr Chyrun Victoria Vysotska Stepan Tchynetskyi Yuriy Ushenko Dmytro Uhryn

DOI: https://doi.org/10.5815/ijisa.2024.06.03, Pub. Date: 8 Dec. 2024

The goal of designing and implementing an intelligent information system for the recognition and classification of sound signals is to create an effective solution at the software level, which would allow analysis, recognition, classification and forecasting of sound signals in megacities and smart cities using machine learning methods. This system can help people in various fields to simplify their lives, for example, it can help farmers protect their crops from animals, in the military it can help with the identification of weapons and the search for flying objects, such as drones or missiles, in the future there is a possibility for recognizing the distance to sound, also, in cities can help with security, so a preventive response system can be built, which can check if everything is in order based on sounds. Also, it can make life easier for people with impaired hearing to detect danger in everyday life. In the part of the comparison of analogues of the developed product, 4 analogues were found: Shazam, sound recognition from Apple, Vocapia, and SoundHound. A table of comparisons was made for these analogues and the product under development. Also, after comparing analogues, a table for evaluating the effects of the development was built. During the system analysis section, a variety of audio research materials were developed to indicate the characteristics that can be used for this design: period, amplitude, and frequency, and, as an example, an article on real-world audio applications is shown. A precedent scenario is described using the RUP methodology and UML diagrams are constructed: Diagram of use cases; Class diagram; Activity chart; Sequence diagram; Diagram of components; and Deployment diagram. Also, sound data analysis was performed, sound data was visualized as spectrograms and sound waves, which clearly show that the data are different, so it is possible to classify them using machine learning methods. An experimental selection of the machine learning method as staandart clasificers for building a sound recognition model was made. The best method turned out to be SVC, the accuracy of which reflects more than 30 per cent. A neural network was also implemented to improve the obtained results. The result of training a model based on a neural network during 100 epochs achieved a result of 97.7% accuracy for training data and 47.8% accuracy when checking performance on test data. This result should be higher, so it is necessary to consider improving recognition algorithms, increasing the amount of data, and changing the recognition method. Testing of the project was carried out, showing its operation and pointing out shortcomings that need to be corrected in the future.

[...] Read more.

Modified Kalman Filter with Chebyshev Points Based on a Recurrent Neural Network for Automatic Control System Measuring Channels Diagnosing and Parring off Failures

By Serhii Vladov Oleksandr Muzychuk Victoria Vysotska Alexey Yurko Dmytro Uhryn

DOI: https://doi.org/10.5815/ijigsp.2024.05.04, Pub. Date: 8 Oct. 2024

The article is devoted to the modified multidimensional Kalman filter with Chebyshev points development to solve the task of diagnosing and parring off failures in the measurement channels of complex dynamic objects automatic control system, which will provide a more accurate and reliable assessment of system state in the presence of outliers in the data. An implementation of the proposed modified multidimensional Kalman filter with Chebyshev points is proposed in the form of a modified recurrent neural network containing a failure diagnostics layer, a failure parry layer, a filtering and smoothing layer, and a results aggregation layer. This structure of the modified recurrent neural network made it possible to solve the main problems of the method of diagnosing and parring off failures of the measuring channels of complex dynamic objects automatic control system, such as diagnosing failures with an accuracy of 0.99802, fending off failures with an accuracy of 0.99796, and assessing the state of the system with an accuracy of 0.99798. It is proposed to use a modified loss function of a recurrent neural network as a general loss function for diagnostics, fault restoring and system state assessment, which makes it possible to avoid retraining when there are a large number of parameters or insufficient data. It has been experimentally proven that the loss function remains stable on both the training and validation data sets for 1000 training epochs and does not go beyond –2.5 % to +2.5 %, which indicates a low-risk overtraining or undertraining of the model. It has been experimentally confirmed that the use of a modified recurrent neural network in solving the task of diagnosing and parring off failures of the measuring channels of complex dynamic objects automatic control system is appropriate in comparison with a radial basis functions neural network and a multidimensional Kalman filter without a neural network implementation, based on metrics such as the root mean square deviation, mean absolute error, mean absolute percentage error, coefficient of determination for the accuracy of reproducing previous data, and coefficient of determination for the accuracy of predicting future values. For example, the value of the standard deviation of the modified recurrent neural network is 0.00226, which is 1.65 times less than the radial basis function neural network and 2.20 times less than the multidimensional Kalman filter without a neural network implementation.

[...] Read more.

Information Technology for Gender Voice Recognition Based on Machine Learning Methods

By Victoria Vysotska Denys Shavaiev Michal Gregus Yuriy Ushenko Zhengbing Hu Dmytro Uhryn

DOI: https://doi.org/10.5815/ijmecs.2024.05.05, Pub. Date: 8 Oct. 2024

The growing use of social networks and the steady popularity of online communication make the task of detecting gender from posts necessary for a variety of applications, including modern education, political research, public opinion analysis, personalized advertising, cyber security and biometric systems, marketing research, etc. This study aims to develop information technology for gender voice recognition by sound based on supervised learning using machine learning algorithms. A model, methods and means of recognition and gender classification of voice speech samples are proposed based on their acoustic properties and machine learning. In our voice gender recognition project, we used a model built based on the neural network using the TensorFlow library and Keras. The speaker’s voice was analysed for various acoustic features, such as frequency, spectral characteristics, amplitude, modulation, etc. The basic model we created is a typical neural network for text classification. It consists of the input layer, hidden layers, and the output layer. For text processing, we use a pre-trained word vector space such as Word2Vec or GloVe. We also used such techniques as dropout to prevent model overtraining, such activation functions as ReLU (Rectified Linear Unit) for non-linearity, and a softmax function in the last layer to obtain class probabilities. To train a model, we used the Adam optimizer, which is a popular gradient descent optimization method, and the “sparse categorical cross-entropy” loss function, since we are dealing with multi-class classification. After training the model, we saved it to a file for further use and evaluation of new data. The application of neural networks in our project allowed us to build a powerful model that can recognize a speaker’s gender by voice with high accuracy. The intelligent system was trained using machine learning methods with each of the methods being analysed for accuracy: K-Nearest Neighbours (98.10%), Decision Tree (96,69%), Logistic Regression (98.11%), Random Forest (96.65%), Support Vector Machine (98.26%), neural networks (98.11%). Additional techniques such as regularization and optimization can be used to improve model performance and prevent overtraining.

[...] Read more.

Intelligent Network Architecture Development for E-Business Processes Based on Ontological Models

By Yevgen Burov Victoria Vysotska Lyubomyr Chyrun Yuriy Ushenko Dmytro Uhryn Zhengbing Hu

DOI: https://doi.org/10.5815/ijieeb.2024.05.01, Pub. Date: 8 Oct. 2024

The use of ontological models for intelligent systems construction allows for improved quality characteristics at all stages of the life cycle of a software product. The main source of improvement in quality characteristics is the possibility of reusing the conceptualization and code provided by the corresponding models. Due to the use of a single conceptualization when creating various software products, the degree of interoperability and code portability increases. The new-generation electronic business analytics systems implementation is based on the use of active models for business processes (BP). Such models, on the one hand, reflect the BPs taking place in the organization on a real-time scale, and on the other hand, embody corporate and other regulatory rules and restrictions and monitor their compliance. The purpose of this article is to research the methods of presenting and building active executable BP models, determining the methods of their execution and coordination, and building the resulting intelligent network of BP models. In the process of its implementation, such a network ensures the implementation, support of decision-making and compliance with regulatory rules in the relevant real BPs. A formal specification of an intelligent system for modelling a complex of BPs of the enterprise using models has been proposed. A hierarchical approach to the introduction of intelligent functions into the modelling system has been proposed. The simulation system is designed to be used for the design and management of complex intelligent systems. Achieving the set goal involves solving several development tasks: methods of presenting BP models for different types of such models; methods of analysis and display of time relations and attributes in BP models; ways of presenting the association of artefacts, and business analytics models with individual BP operations; metric ratios for evaluating the quality of process execution; methods of interaction of various BPs and coordination of their implementation. The purpose of functioning an intelligent model-driven software system is achieved through the interaction of a large number of simple models. At the same time, each model encapsulates a certain aspect of the expert's knowledge about the subject area. To apply executable conceptual models in the field of modelling BPes, it is necessary to determine the types of conceptual models used, their purpose and functions, and the role they play in the operation of an intelligent system. Models used in modelling BPes can be classified according to various characteristics. At the same time, the same model can be included in different classifications.

[...] Read more.

Disinformation, Fakes and Propaganda Identifying Methods in Online Messages Based on NLP and Machine Learning Methods

By Victoria Vysotska Krzysztof Przystupa Lyubomyr Chyrun Serhii Vladov Yuriy Ushenko Dmytro Uhryn Zhengbing Hu

DOI: https://doi.org/10.5815/ijcnis.2024.05.06, Pub. Date: 8 Oct. 2024

A new method of propaganda analysis is proposed to identify signs and change the dynamics of the behaviour of coordinated groups based on machine learning at the processing disinformation stages. In the course of the work, two models were implemented to recognise propaganda in textual data - at the message level and the phrase level. Within the framework of solving the problem of analysis and recognition of text data, in particular, fake news on the Internet, an important component of NLP technology (natural language processing) is the classification of words in text data. In this context, classification is the assignment or assignment of textual data to one or more predefined categories or classes. For this purpose, the task of binary text classification was solved. Both models are built based on logistic regression, and in the process of data preparation and feature extraction, such methods as vectorisation using TF-IDF vectorisation (Term Frequency – Inverse Document Frequency), the BOW model (Bag-of-Words), POS marking (Part-Of-Speech), word embedding using the Word2Vec two-layer neural network, as well as manual feature extraction methods aimed at identifying specific methods of political propaganda in texts are used. The analogues of the project under development are analysed the subject area (the propaganda used in the media and the basis of its production methods) is studied. The software implementation is carried out in Python, using the seaborn, matplotlib, genism, spacy, NLTK (Natural Language Toolkit), NumPy, pandas, scikit-learn libraries. The model's score for propaganda recognition at the phrase level was obtained: 0.74, and at the message level: 0.99. The implementation of the results will significantly reduce the time required to make the most appropriate decision on the implementation of counter-disinformation measures concerning the identified coordinated groups of disinformation generation, fake news and propaganda. Different classification algorithms for detecting fake news and non-fakes or fakes identification accuracy from Internet resources ana social mass media are used as the decision tree (for non-fakes identification accuracy 0.98 and fakes identification accuracy 0.9903), the k-nearest neighbours (0.83/0.999), the random forest (0.991/0.933), the multilayer perceptron (0.9979/0.9945), the logistic regression (0.9965/0.9988), and the Bayes classifier (0.998/0.913). The logistic regression (0.9965) the multilayer perceptron (0.9979) and the Bayesian classifier (0.998) are more optimal for non-fakes news identification. The logistic regression (0.9988), the multilayer perceptron (0.9945), and k-nearest neighbours (0.999) are more optimal for identifying fake news identification.

[...] Read more.

Polymorphic Radial Basis Functions Neural Network

By Serhii Vladov Ruslan Yakovliev Victoria Vysotska Dmytro Uhryn Artem Karachevtsev

DOI: https://doi.org/10.5815/ijisa.2024.04.01, Pub. Date: 8 Aug. 2024

The work is devoted to the development of the radial basis functions (RBF networks) neural network new architecture – a polymorphic RBF network in which the one-dimensional radial basis functions (RBFs) in the hidden layer instead, multidimensional RBFs are used, which makes it possible to better approximate complex functions that depend on several independent variables. Moreover, in its second layer, the summing the RBF outputs one by one from each group instead, multiplication is used, which allows the polymorphic RBF network to better identify relations between independent variables. Based on the training classical RBF networks evolutionary algorithm, the polymorphic RBF network training algorithm was created, which, through the initializing weight coefficients methods use taking into account the tasks structure and preliminary values, using the mutations tournament selection, adding additional criteria to the fitness function to take into account stability and speed training a polymorphic RBF network, as well as using an evolutionary mutation strategy, allowed us to obtain the lowest errors in training and testing a polymorphic RBF network compared to known RBF network architectures. The created polymorphic RBF network practical application possibility is demonstrated experimentally using the helicopters turboshaft engines (using the example, the TV3-117 turboshaft engine) operating process parameters optimizing solving task using a multicriteria optimization algorithm. The optimal Pareto front was obtained, which made it possible to obtain the engine operation three additional modes: maximum reduction of specific fuel consumption at the total pressure in the compressor increase degree increased value by 5.0 %, specific fuel consumption minimization at the total pressure in the compressor increase degree reduced value by 1.0 %, the total pressure in the compressor increases degree optimal value with a slight increase in specific fuel consumption by 10.5 %. Future research prospects include adapting the developed methods and models into the general concept for monitoring and controlling helicopter turboshaft engines during flight operations. This concept is implemented in the neural network expert system and the on-board automatic control system.

[...] Read more.

Universal On-board Neural Network System for Restoring Information in Case of Helicopter Turboshaft Engine Sensor Failure

By Serhii Vladov Ruslan Yakovliev Victoria Vysotska Dmytro Uhryn Yuriy Ushenko

DOI: https://doi.org/10.5815/ijcnis.2024.04.05, Pub. Date: 8 Aug. 2024

This work focuses on developing a universal onboard neural network system for restoring information when helicopter turboshaft engine sensors fail. A mathematical task was formulated to determine the occurrence and location of these sensor failures using a multi-class Bayesian classification model that incorporates prior knowledge and updates probabilities with new data. The Bayesian approach was employed for identifying and localizing sensor failures, utilizing a Bayesian neural network with a 4–6–3 structure as the core of the developed system. A training algorithm for the Bayesian neural network was created, which estimates the prior distribution of network parameters through variational approximation, maximizes the evidence lower bound of direct likelihood instead, and updates parameters by calculating gradients of the log-likelihood and evidence lower bound, while adding regularization terms for warnings, distributions, and uncertainty estimates to interpret results. This approach ensures balanced data handling, effective training (achieving nearly 100% accuracy on both training and validation sets), and improved model understanding (with training losses not exceeding 2.5%). An example is provided that demonstrates solving the information restoration task in the event of a gas-generator rotor r.p.m. sensor failure in the TV3-117 helicopter turboshaft engine. The developed onboard neural network system implementing feasibility on a helicopter using the neuro-processor Intel Neural Compute Stick 2 has been analytically proven.

[...] Read more.

Information Technology for the Data Integration in Intelligent Systems of Business Analytics

By Victoria Vysotska Andrii Berko Yevhen Burov Dmytro Uhryn Zhengbing Hu Valentyna Dvorzhak

DOI: https://doi.org/10.5815/ijieeb.2024.04.05, Pub. Date: 8 Aug. 2024

The purpose of the research is to develop mathematical models, solution methods and layouts of tools for problems solving of integrating information resources and creation of intelligent systems of business analytics based on effective models. These problems can be solved by automating the business processes execution and introducing artificial intelligence components into the business processes management systems. It can be said that the essence of the modern stage of the business processes modelling systems development is the transition from mainly manual (or with the use of auxiliary software) methods of business processes analysis to mainly automatic management of the business processes execution, construction of intelligent business processes networks in the interconnected conceptual models’ set form that encapsulate knowledge about the structure, the business processes features, system events, limitations and dependencies and are processed by machine. Decision-making powers are delegated to such information system in clearly defined (most often simple, routine) situations. So, in this way, it is possible to form the information resource of intelligent systems of business analytics as a single coherent set of data, suitable for use in solving a wide range of multifaceted problems. The integration approach of forming information resources has certain advantages over other approaches, in particular, regarding the information resources of intelligent systems of business analytics. The use of integration as a means of forming a set of consistent data has certain advantages, namely, it allows: combine data of different formats, content and origins in a single, consistent set; combine data without converting them to a single format, which is especially important when such conversion is difficult or impossible; creates virtual custom images of data that do not depend on their real appearance; creates opportunities to operate both real physical and virtual data in their combination; dynamically supplement, change and transform both the data itself and their descriptions; to provide uniform methods and technologies of perception and application of a large amount of various data.

[...] Read more.

Modeling and Development of a Computer Simulator with the Formation of Working Scenarios for Training Operator Personnel in the Search for Objects

By Taras Basyuk Andrii Vasyliuk Yuriy Ushenko Dmytro Uhryn Zhengbing Hu Mariia Talakh

DOI: https://doi.org/10.5815/ijmecs.2024.04.07, Pub. Date: 8 Aug. 2024

The article is dedicated to solving the problem of modeling and developing a computer simulator with the creation of working scenarios for training operating personnel in object detection. The analysis of the features of human operator activity is carried out, the model of his behavior is described, and it is shown that for the presented task, the following three levels must be taken into account: behavior based on abilities (skills), behavior based on rules, behavior based on knowledge. User models that are used in man-machine systems were created, and their use in the process of modeling operator activity from the point of view of regular and irregular exposure was shown. This made it possible to create a prototype of a graphical window using a user-friendly interface. A system model of human-machine interface for processing and recognition of visual information is mathematically described and a model of image representation based on three possible scenarios of their formation is formed. The result of the study was the software implementation of an effective educational tool prototype that accurately replicates real-world conditions for the formation of working scenarios. The conducted experimental research showed the possibility of general image recognition tests, selection of different test modes, and support for arbitrary sets of image test tasks. Further research will be aimed at expanding the
functionality of the created prototype, developing additional modules, automatically generating scenarios and verifying work.

[...] Read more.

Information Technology for the Operational Processing of Military Content for Commanders of Tactical Army Units

By Vitaliy Danylyk Victoria Vysotska Vasyl Andrunyk Dmytro Uhryn Yuriy Ushenko

DOI: https://doi.org/10.5815/ijcnis.2024.03.09, Pub. Date: 8 Jun. 2024

In the modern world, the military sphere occupies a very high place in the life of the country. At the same time, this area needs quick and accurate solutions. This decision can greatly affect the unfolding of events on the battlefield and indicate that they must be used carefully, using all possible means. During the war, the speed and importance of decisions are very important, and we note that the relevance of this topic is growing sharply. The purpose of the work is to create a comprehensive information system that facilitates the work of commanders of tactical units, which organizes the visualization and classification of aerial objects in real-time, the classification of objects for radio-technical intelligence, the structuring of military information and facilitates the perception of military information. The object of research/development is a phenomenon that creates a problematic problem, has the presence of slowing factors in the process of command and control, using teams of tactical links, which can slow down decision-making, as well as affect their correctness. The research/development aims to address emerging bottlenecks in the command-and-control process performed by tactical link teams, providing improved visualization, analysis and work with military data. The result of the work is an information system for processing military data to help commanders of tactical units. This system significantly improves on known officer assistance tools, although it includes a set of programs that have been used in parallel on an as-needed basis. Using modern information technologies and ease of use, the system covers problems that may arise for commanders. Also, each program included in the complex information system has its degree of innovation. The information system for structuring military information is distinguished by the possibility of use on any device. The information system for the visualization and clustering of aerial objects and the information system for the classification of objects for radio technical intelligence are distinguished by their component nature. This means that the application can use sources of input information and provides an API to use other processing information. Regarding the information system for integration into information materials, largely unknown terms and abbreviations are defined, so such solutions, cannot integrate the required data into real documents. Therefore, using this comprehensive information system, the command of tactical units will have the opportunity to improve the quality and achieve the command-and-control process.

[...] Read more.

Data Clustering by Chaotic Oscillatory Neural Networks with Dipole Synaptic Connections

By Roman Peleshchak Vasyl Lytvyn Ivan Peleshchak Dmytro Dudyk Dmytro Uhryn

DOI: https://doi.org/10.5815/ijmecs.2024.03.03, Pub. Date: 8 Jun. 2024

This article introduces a novel approach to data clustering based on the oscillatory chaotic neural network with dipole synaptic connections. The conducted research affirms that the proposed model effectively facilitates the formation of clusters of objects with similar properties due to the use of a slowly decreasing function of the dipole synaptic strength. The studies demonstrate that the degree of neuron synchronization in networks with dipole synaptic connections surpasses that in networks with Gaussian synaptic connections. The findings also indicate an increase in the interval of the resolution range in the model featuring dipole neurons, underscoring the effectiveness of the proposed method.

[...] Read more.

Information Technology for Generating Lyrics for Song Extensions Based on Transformers

By Oleksandr Mediakov Victoria Vysotska Dmytro Uhryn Yuriy Ushenko Cennuo Hu

DOI: https://doi.org/10.5815/ijmecs.2024.01.03, Pub. Date: 8 Feb. 2024

The article develops technology for generating song lyrics extensions using large language models, in particular the T5 model, to speed up, supplement, and increase the flexibility of the process of writing lyrics to songs with/without taking into account the style of a particular author. To create the data, 10 different artists were selected, and then their lyrics were selected. A total of 626 unique songs were obtained. After splitting each song into several pairs of input-output tapes, 1874 training instances and 465 test instances were obtained. Two language models, NSA and SA, were retrained for the task of generating song lyrics. For both models, t5-base was chosen as the base model. This version of T5 contains 223 million parameters. The analysis of the original data showed that the NSA model has less degraded results, and for the SA model, it is necessary to balance the amount of text for each author. Several text metrics such as BLEU, RougeL, and RougeN were calculated to quantitatively compare the results of the models and generation strategies. The value of the BLEU metric is the most diverse, and its value varies significantly depending on the strategy. At the same time, Rouge metrics have less variability and a smaller range of values. In total, for comparison, we used 8 different decoding methods for text generation supported by the transformers library, including Greedy search, Beam search, Diverse beam search, Multinomial sampling, Beam-search multinomial sampling, Top-k sampling, Top-p sampling, and Contrastive search. All the results of the lyrics comparison show that the best method for generating lyrics is beam search and its variations, including ray sampling. The contrastive search usually outperformed the usual greedy approach. The top-p and top-k methods do not have a clear advantage over each other, and in different situations, they produced different results.

[...] Read more.

Modelling of an Intelligent Geographic Information System for Population Migration Forecasting

By Dmytro Uhryn Yuriy Ushenko Vasyl Lytvyn Zhengbing Hu Olga Lozynska Victor Ilin Artur Hostiuk

DOI: https://doi.org/10.5815/ijmecs.2023.04.06, Pub. Date: 8 Aug. 2023

A generalized model of population migration is proposed. On its basis, models of the set of directions of population flows, the duration of migration, which is determined by its nature in time, type and form of migration, are developed. The model of indicators of actual migration (resettlement) is developed and their groups are divided. The results of population migration are described, characterized by a number of absolute and relative indicators for the purpose of regression analysis of data. To obtain the results of migration, the author takes into account the power of migration flows, which depend on the population of the territories between which the exchange takes place and on their location on the basis of the coefficients of the effectiveness of migration ties and the intensity of migration ties. The types of migration intensity coefficients depending on the properties are formed. The lightgbm algorithm for predicting population migration is implemented in the intelligent geographic information system. The migration forecasting system is also capable of predicting international migration or migration between different countries. The significance of conducting this survey lies in the increasing need for accurate and reliable migration forecasts. With globalization and the connectivity of nations, understanding and predicting migration patterns have become crucial for various domains, including social planning, resource allocation, and economic development. Through extensive experimentation and evaluation, developed migration forecasting system has demonstrated results of human migration based on machine learning algorithms. Performance metrics of migration flow forecasting models are investigated, which made it possible to present the results obtained from the evaluation of these models using various performance indicators, including the mean square error (MSE), root mean square error (RMSE) and R-squared (R2). The MSE and RMSE measure the root mean square difference between predicted and actual values, while the R2 represents the proportion of variance explained by the model.

[...] Read more.

Intelligent Analysis of Ukrainian-language Tweets for Public Opinion Research based on NLP Methods and Machine Learning Technology

By Oleh Prokipchuk Victoria Vysotska Petro Pukach Vasyl Lytvyn Dmytro Uhryn Yuriy Ushenko Zhengbing Hu

DOI: https://doi.org/10.5815/ijmecs.2023.03.06, Pub. Date: 8 Jun. 2023

The article develops a technology for finding tweet trends based on clustering, which forms a data stream in the form of short representations of clusters and their popularity for further research of public opinion. The accuracy of their result is affected by the natural language feature of the information flow of tweets. An effective approach to tweet collection, filtering, cleaning and pre-processing based on a comparative analysis of Bag of Words, TF-IDF and BERT algorithms is described. The impact of stemming and lemmatization on the quality of the obtained clusters was determined. Stemming and lemmatization allow for significant reduction of the input vocabulary of Ukrainian words by 40.21% and 32.52% respectively. And optimal combinations of clustering methods (K-Means, Agglomerative Hierarchical Clustering and HDBSCAN) and vectorization of tweets were found based on the analysis of 27 clustering of one data sample. The method of presenting clusters of tweets in a short format is selected. Algorithms using the Levenstein Distance, i.e. fuzz sort, fuzz set and Levenshtein, showed the best results. These algorithms quickly perform checks, have a greater difference in similarities, so it is possible to more accurately determine the limit of similarity. According to the results of the clustering, the optimal solutions are to use the HDBSCAN clustering algorithm and the BERT vectorization algorithm to achieve the most accurate results, and to use K-Means together with TF-IDF to achieve the best speed with the optimal result. Stemming can be used to reduce execution time. In this study, the optimal options for comparing cluster fingerprints among the following similarity search methods were experimentally found: Fuzz Sort, Fuzz Set, Levenshtein, Jaro Winkler, Jaccard, Sorensen, Cosine, Sift4. In some algorithms, the average fingerprint similarity reaches above 70%. Three effective tools were found to compare their similarity, as they show a sufficient difference between comparisons of similar and different clusters (> 20%).
The experimental testing was conducted based on the analysis of 90,000 tweets over 7 days for 5 different weekly topics: President Volodymyr Zelenskyi, Leopard tanks, Boris Johnson, Europe, and the bright memory of the deceased. The research was carried out using a combination of K-Means and TF-IDF methods, Agglomerative Hierarchical Clustering and TF-IDF, HDBSCAN and BERT for clustering and vectorization processes. Additionally, fuzz sort was implemented for comparing cluster fingerprints with a similarity threshold of 55%. For comparing fingerprints, the most optimal methods were fuzz sort, fuzz set, and Levenshtein. In terms of execution speed, the best result was achieved with the Levenshtein method. The other two methods performed three times worse in terms of speed, but they are nearly 13 times faster than Sift4. The fastest method is Jaro Winkler, but it has a 19.51% difference in similarities. The method with the best difference in similarities is fuzz set (60.29%). Fuzz sort (32.28%) and Levenshtein (28.43%) took the second and third place respectively. These methods utilize the Levenshtein distance in their work, indicating that such an approach works well for comparing sets of keywords. Other algorithms fail to show significant differences between different fingerprints, suggesting that they are not adapted to this type of task.

[...] Read more.

Information Technologies for Decision Support in Industry-Specific Geographic Information Systems based on Swarm Intelligence

By Vasyl Lytvyn Olga Lozynska Dmytro Uhryn Myroslava Vovk Yuriy Ushenko Zhengbing Hu

DOI: https://doi.org/10.5815/ijmecs.2023.02.06, Pub. Date: 8 Apr. 2023

A method of choosing swarm optimization algorithms and using swarm intelligence for solving a certain class of optimization tasks in industry-specific geographic information systems was developed considering the stationarity characteristic of such systems. The method consists of 8 stages. Classes of swarm algorithms were studied. It is shown which classes of swarm algorithms should be used depending on the stationarity, quasi-stationarity or dynamics of the task solved by an industry geographic information system. An information model of geodata that consists in a formalized combination of their spatial and attributive components, which allows considering the relational, semantic and frame models of knowledge representation of the attributive component, was developed. A method of choosing optimization methods designed to work as part of a decision support system within an industry-specific geographic information system was developed. It includes conceptual information modeling, optimization criteria selection, and objective function analysis and modeling. This method allows choosing the most suitable swarm optimization method (or a set of methods).

[...] Read more.

MECS Press Menu

Dmytro Uhryn

Author Articles

Information Engineering for Fake Job Postings Classification in Electronic Business Based on Machine Learning Technology

Sentiment Analysing and Visualising Public Opinion on Political Figures across YouTube and Twitter Using NLP and Machine Learning

Local Agentic RAG-Based Information System Development for Intelligent Analysis of GitHub Code Repositories in Computer Science Education

Smart Tool for Identifying Misinformation Spread Sources and Routes in Social Networks Based on NLP and Machine Learning

Smart Application for Recruiting Based on Natural Language Processing Methods, Transformer Models and Siamese Neural Network Architecture

Information Technology for Modelling Social Trends in Telegram Using E5 Vectors and Hybrid Cluster Analysis

Smart Tool for Text Content Analysis to Identify Key Propaganda Narratives and Disinformation in News Based on NLP and Machine Learning

Application of Tensor Networks Analysis to Optimize Traffic Management in a Critical Information and Telecommunications Network

Comprehensive Intellectual Analysis of Statistical Data on Leading Energy Companies’ Actions

Development and Testing of Voice User Interfaces Based on BERT Models for Speech Recognition in Distance Learning and Smart Home Systems

Intelligent Application for Predicting Diabetes Spread Risk in the World Based on Machine Learning

Agile Technology of Information Data Engineering for Intelligent Analysis of the Happiness Index and Life Satisfaction in Known World Cities

Agile Methodology of Information Engineering for Semantic Annotations Categorization and Creation in Scientific Articles Based on NLP and Machine Learning Methods

Intelligent Application for Textual Content Authorship Identification based on Machine Learning and Sentiment Analysis

Agile Intelligent Information Technology for Speech Synthesis Based on Transfer Function Approximation Methods Using Continued Fractions

Agile Intelligent Software Solution for Textual Content Authorship Identification Based on NLP, Artificial Intelligence and Machine Learning

Agile Methodology for Identifying Original and Fake Printed Documents based on Secret Raster Formation

Intelligent Processing Censoring Inappropriate Content in Images, News, Messages and Articles on Web Pages Based on Machine Learning

Recognizing Fakes, Propaganda and Disinformation in Ukrainian Content based on NLP and Machine-learning Technology

Information Technology for Sound Analysis and Recognition in the Metropolis based on Machine Learning Methods

Modified Kalman Filter with Chebyshev Points Based on a Recurrent Neural Network for Automatic Control System Measuring Channels Diagnosing and Parring off Failures

Information Technology for Gender Voice Recognition Based on Machine Learning Methods

Intelligent Network Architecture Development for E-Business Processes Based on Ontological Models

Disinformation, Fakes and Propaganda Identifying Methods in Online Messages Based on NLP and Machine Learning Methods

Polymorphic Radial Basis Functions Neural Network

Universal On-board Neural Network System for Restoring Information in Case of Helicopter Turboshaft Engine Sensor Failure

Information Technology for the Data Integration in Intelligent Systems of Business Analytics

Modeling and Development of a Computer Simulator with the Formation of Working Scenarios for Training Operator Personnel in the Search for Objects

Information Technology for the Operational Processing of Military Content for Commanders of Tactical Army Units

Data Clustering by Chaotic Oscillatory Neural Networks with Dipole Synaptic Connections

Information Technology for Generating Lyrics for Song Extensions Based on Transformers

Modelling of an Intelligent Geographic Information System for Population Migration Forecasting

Intelligent Analysis of Ukrainian-language Tweets for Public Opinion Research based on NLP Methods and Machine Learning Technology

Information Technologies for Decision Support in Industry-Specific Geographic Information Systems based on Swarm Intelligence

Other Articles