Agile Methodology of Information Engineering for Semantic Annotations Categorization and Creation in Scientific Articles Based on NLP and Machine Learning Methods

By Danylo Levkivskyi Victoria Vysotska Lyubomyr Chyrun Yuriy Ushenko Dmytro Uhryn Cennuo Hu

DOI: https://doi.org/10.5815/ijieeb.2025.02.01, Pub. Date: 8 Apr. 2025

Research devoted to the categorization and creation of semantic annotations for scientific articles stands out as an essential direction of development in the context of the growing volume of scientific literature. The application of machine learning and natural language processing in this field allows you to effectively organize and provide access to scientific information. The article discusses methods of automatic annotation of texts. Based on the review, the use of the constraint propagation model is proposed to improve the technique of text relationship maps. The developed software system is aimed at automating the process of analysis and categorization of scientific materials, which opens the way to improving the speed and accuracy of searching for the necessary information for researchers. The use of advanced machine learning models, such as roBERTa and RAG, ensures the highest quality of data processing and creation of semantic annotations. The accuracy of predicting article categories after improving the model reached 88%. The novelty of the approach is the combination of categorization and semantic annotation to increase the convenience and speed of searching for scientific information. The software system opens up opportunities for future expansion and improvement through the use of advanced technologies and machine learning models. This study is noted for its relevance, originality of approach and potential for practical application in the field of scientific research and development of science as a whole. The proposed approach contributes to the development of the Information Engineering and Electronic Business industry through the following key aspects: automation of categorization and annotation of scientific articles, improving the accuracy of information search, increasing the efficiency of scientific research, and the flexibility and scalability of the solution.

[...] Read more.

Classification of Multilingual Financial Tweets Using an Ensemble Approach Driven by Transformers

By Rupam Bhattacharyya

DOI: https://doi.org/10.5815/ijieeb.2025.02.02, Pub. Date: 8 Apr. 2025

There is a growing interest in multilingual tweet analysis through advanced deep learning techniques. Identifying the sentiments of Twitter (currently known as X) users during the IPO (Initial Public Offering) is an important application area in the financial domain. The number of research works in this domain is less. In this paper, we introduced a multilingual dataset entitled as LIC IPO dataset. This work also offers a modified majority voting-based ensemble technique in addition to our proposed dataset. This test-time ensembling technique is driven by fine-tuning of state-of-the-art transformer-based pretrained language models used in multilingual natural language processing (NLP) research. Our technique has been employed to perform sentiment analysis over LIC IPO dataset. Performance evaluation of our technique along with five transformer-based multilingual NLP models over this dataset has been reported in this paper. These five models are namely a) Bernice, b) TwHIN-BERT, c) MuRIL, d) mBERT, and e) XLM-RoBERTa. It is found that our test-time ensemble technique solves this multi-class sentiment classification problem defined over the proposed dataset in a better way as compared to individual transformer models. Encouraging experimental outcomes confirms the efficacy of the proposed approach

[...] Read more.

Development of Past Learning Recognition Assessment Data Processing System for Professional Engineer Program Using Scrum Method

By Trisya Septiana Dikpride Despa Fadil Hamdani Deny Budiyanto Reza Andrea

DOI: https://doi.org/10.5815/ijieeb.2025.02.03, Pub. Date: 8 Apr. 2025

The University of Lampung is one of the universities mandated to run the Professional Engineer Program (PPI) through the Past Learning Recognition (RPL) pathway. Individuals following this RPL path must have at least five years of experience in the engineering field, where their education, work, and training data from formal and informal institutions can be converted into six courses totaling 24 credits. The RPL data assessment process, if conducted manually, takes a long time and hampers the administrative process in PPI. Therefore, an effective and efficient assessment process is automated through a web-based application by developing an RPL data final grade processing system (E-RAPEL), which addresses common problems in PPI and facilitates grade administration. The system development adopts the Scrum method to enhance product performance, teamwork, and the work environment. Data collection in this study was conducted through interviews and direct observation, and the results indicate that the system facilitates the final assessment process of RPL data using black box testing. The findings show that all test components functioned as expected and reduced the time required for the RPL data final assessment process in PPI.

[...] Read more.

Formation of Innovativeness for the Business Processes of Enterprise Using Data Processing

By Zarina Poberezhna Maksym Zaliskyi Anton Kniaziev

DOI: https://doi.org/10.5815/ijieeb.2025.02.04, Pub. Date: 8 Apr. 2025

The article discusses the issues of development and analysis of diagnostic procedures for business processes during enterprise management. The digitalization has become a priority at the state level of every country, influencing the daily lives of citizens and the enterprises activity. As a result, the ability to gather, analyze, process, and use the data has taken center place to support effective decision-making and sustain competitive market positions. The article considers the factors influencing the choice of data processing tools, analyses the difficulties faced during the data processing methods implementation, and outlines the essential features of such systems for effective management of enterprise activity. The main attention was paid to the development of a data processing method during the state diagnosis of business processes in case of assessing their compliance. The method involves calculating the probability density function for the costs of restoring the normal functioning of business processes and statistical characteristics of the probability of correct decision-making. Additionally, the article includes numerical examples demonstrating the use of this method to the business processes of an aviation enterprise engaged in providing and performing technological procedures for the operation of aircraft. The proposed data processing model can be used to analyze the efficiency of enterprises’ business processes and make decisions on organizational structure optimization to minimize the costs spent by enterprise.

[...] Read more.

URLGuard: A Holistic Hybrid Machine Learning Approach for Phishing Detection

By Pradip M. Paithane

DOI: https://doi.org/10.5815/ijieeb.2025.02.05, Pub. Date: 8 Apr. 2025

The fast growth of Internet technology has significantly changed online users’ experiences, while security concerns are becoming increasingly overpowering. Among these concerns, phishing stands out as a prominent criminal activity that uses social engineering and technology to steal a victim’s identification data and account information. According to the Anti-Phishing Working Group (APWG), the number of phishing detections increased by 46 in the first quarter of 2018 compared to the fourth quarter of 2017. So to overcome these situations below paper introduces a phishing detection system using a hybrid machine learning approach based on URL attributes. It addresses the growing threat of phishing attacks that exploit email manipulation and fake websites to deceive users and steal sensitive data. The study employs a phishing URL dataset with over 11,000 websites, extracted from a reputable repository. After pre-processing, a hybrid machine learning model, which includes Decision Tree, Random Forest, and XGB is employed to safeguard against phishing URLs. The proposed approach undergoes evaluation with key metrics such as precision, accuracy, recall, F1-score, and specificity. Results demonstrate that the proposed method surpasses other models, achieving superior accuracy and efficiency in detecting phishing attacks.

[...] Read more.

IoT Based Smart Energy Consumption Prediction for Home Appliances

By Atiqur Rahman Sadia Hossain Samsuddin Ahmed Md. Toukir Ahmed

DOI: https://doi.org/10.5815/ijieeb.2025.02.06, Pub. Date: 8 Apr. 2025

Optimizing energy management for household appliances is essential for maximizing domestic energy utilization and enabling preventive maintenance. Recent studies indicate that traditional forecasting approaches frequently lack the necessary accuracy and real-time learning capabilities required for effective management of household energy. This study demonstrates the implementation of a comprehensive strategy that integrates Internet of Things (IoT) data, machine learning (ML), and explainable artificial intelligence (XAI) to improve the accuracy and interpretability of predicting energy usage in residential buildings. Our research focuses on the rising issues faced by IoT-based smart systems, partic- ularly the deficiencies in the performance of current solutions. Therefore, as compared to the other 17 models that were examined, polynomial regression demonstrated outstanding performance. Our solution utilizes a non-intrusive sensor to collect data without disrupting its operation. Real-time data collecting is achieved through a Flask-based web page with Ngrok for external access.The efficacy of the proposed system was assessed using many metrics, yielding highly satisfac- tory results: the root mean square error (RMSE) was 0.03, the mean absolute error (MAE) was 0.02, the mean absolute percentage error (MAPE) was 0.04, and the coefficient of determination (R²) was 0.9989. However, modern cutting-edge methods still face considerable hurdles when it comes to interpretability. In order to tackle these problems, we include XAI techniques such as SHAP and LIME. Explainable Artificial Intelligence (XAI) improves the interpretability of the model by elucidating the impact of various variables on energy consumption forecasts. Not only does this increase the effectiveness of the model, but it also promotes comprehension of the data and enables them to identify the elements that influence home energy usage.

[...] Read more.

Data Deduplication-based Efficient Cloud Optimisation Technique: Optimizing Cloud Storage through Data Deduplication

By Ranga Kavitha Mahaboob Sharief Shaik Narala Swarnalatha M. Pujitha Syed Asadullah Hussaini Samiullah Khan Shamsher Ali

DOI: https://doi.org/10.5815/ijieeb.2025.02.07, Pub. Date: 8 Apr. 2025

Effective storage management is crucial for cloud computing systems' speed and cost, given data's exponential increase. The significance of this issue has increased as the amount of data continues to increase at a disturbing pace. The act of detecting and removing duplicate data can enhance storage utilisation and system efficiency. Using less storage capacity reduces data transmission costs and enhances cloud infrastructure scalability. The use of deduplication techniques on a wide scale, on the other hand, presents a number of important obstacles. Security issues, delays in deduplication, and maintaining data integrity are all examples of difficulties that fall under this classification.
This paper introduces a revolutionary method called Data Deduplication-based Efficient Cloud Optimisation Technique (DD-ECOT). Optimising storage processes and enhancing performance in cloud-based systems is its intended goal. DD-ECOT combines advanced pattern recognition with chunking to increase storage efficiency at minimal cost. It protects data during deduplication with secure hash-based indexing. Parallel processing and scalable design decrease latency, making it adaptable enough for vast, ever-changing cloud setups.The DD-ECOT system avoids these problems through employing a secure hash-based indexing method to keep data intact and by using parallel processing to speed up deduplication without impacting system performance. Enterprise cloud storage systems, disaster recovery solutions, and large-scale data management environments are some of the usage cases for DD-ECOT. Analysis of simulations shows that the suggested solution outperforms conventional deduplication techniques in terms of storage efficiency, data retrieval speed, and overall system performance. The findings suggest that DD-ECOT has the ability to improve cloud service delivery while cutting operational costs. A simulation reveals that the proposed DD-ECOT framework outperforms existing deduplication methods. DD-ECOT boosts storage efficiency by 92.8% by reducing duplicate data. It reduces latency by 97.2% using parallel processing and sophisticated deduplication. Additionally, secure hash-based indexing methods improve data integrity to 98.1%. Optimized bandwidth usage of 95.7% makes data transfer efficient. These improvements suggest DD-ECOT may save operational costs, optimize storage, and beat current deduplication methods.

[...] Read more.

International Journal of Information Engineering and Electronic Business (IJIEEB)

MECS Press Journal

Table Of Contents

Agile Methodology of Information Engineering for Semantic Annotations Categorization and Creation in Scientific Articles Based on NLP and Machine Learning Methods

Classification of Multilingual Financial Tweets Using an Ensemble Approach Driven by Transformers

Development of Past Learning Recognition Assessment Data Processing System for Professional Engineer Program Using Scrum Method

Formation of Innovativeness for the Business Processes of Enterprise Using Data Processing

URLGuard: A Holistic Hybrid Machine Learning Approach for Phishing Detection

IoT Based Smart Energy Consumption Prediction for Home Appliances

Data Deduplication-based Efficient Cloud Optimisation Technique: Optimizing Cloud Storage through Data Deduplication