Souvik Sengupta

Work place: Aliah University, Kolkata, India

E-mail: mesouvik@hotmail.com

Website: https://orcid.org/0000-0003-1842-7523

Research Interests: Computer Science & Information Technology, Medical Informatics, Computational Learning Theory, Medical Image Computing

Biography

Dr. Souvik Sengupta is currently an Associate Professor, Department of Computer Science and Engineering, Aliah University, Kolkata, India. He received his PhD (Tech) from the University of Calcutta in 2017. His research interests include Data Science, Machine Learning, Bio-medical image analysis, NLP, and Education Technology.

Author Articles
Towards Finding a Minimal Set of Features for Predicting Students' Performance Using Educational Data Mining

By Souvik Sengupta

DOI: https://doi.org/10.5815/ijmecs.2023.03.04, Pub. Date: 8 Jun. 2023

An early prediction of students' academic performance helps to identify at-risk students and enables management to take corrective actions to prevent them from going astray. Most of the research works in this field have used supervised machine learning approaches to their crafted datasets having numerous attributes or features. Since these datasets are not publicly available, it is hard to understand and compare the significance of the chosen features and the efficacy of the different machine learning models employed in the classification task. In this work, we analyzed 27 research papers published in the last ten tears (2011- 2021) that used machine learning models for predicting students' performance. We identify the most frequently used features in the private datasets, their interrelationships, and abstraction levels. We also explored three popular public datasets and performed statistical analysis like the Chi-square test and Person's correlation on its features. A minimal set of essential features is prepared by fusing the frequent features and the statistically significant features. We propose an algorithm for selecting a minimal set of features from any dataset with a given set of features. We compared the performance of different machine learning models on the three public datasets in two experimental setups- one with the complete feature set and the other with a minimal set of features. Compared to using the complete feature set, it is observed that most supervised models perform nearly identically and, in some cases, even better with the reduced feature set. The proposed method is capable of identifying the most essential feature set from any new dataset for predicting students' performance.

[...] Read more.
Other Articles