Work place: Aliah University, Kolkata, India
E-mail: mesouvik@hotmail.com
Website: https://orcid.org/0000-0003-1842-7523
Research Interests: Computer Science & Information Technology, Medical Informatics, Computational Learning Theory, Medical Image Computing
Biography
Dr. Souvik Sengupta is currently an Associate Professor, Department of Computer Science and Engineering, Aliah University, Kolkata, India. He received his PhD (Tech) from the University of Calcutta in 2017. His research interests include Data Science, Machine Learning, Bio-medical image analysis, NLP, and Education Technology.
DOI: https://doi.org/10.5815/ijmecs.2023.03.04, Pub. Date: 8 Jun. 2023
An early prediction of students' academic performance helps to identify at-risk students and enables management to take corrective actions to prevent them from going astray. Most of the research works in this field have used supervised machine learning approaches to their crafted datasets having numerous attributes or features. Since these datasets are not publicly available, it is hard to understand and compare the significance of the chosen features and the efficacy of the different machine learning models employed in the classification task. In this work, we analyzed 27 research papers published in the last ten tears (2011- 2021) that used machine learning models for predicting students' performance. We identify the most frequently used features in the private datasets, their interrelationships, and abstraction levels. We also explored three popular public datasets and performed statistical analysis like the Chi-square test and Person's correlation on its features. A minimal set of essential features is prepared by fusing the frequent features and the statistically significant features. We propose an algorithm for selecting a minimal set of features from any dataset with a given set of features. We compared the performance of different machine learning models on the three public datasets in two experimental setups- one with the complete feature set and the other with a minimal set of features. Compared to using the complete feature set, it is observed that most supervised models perform nearly identically and, in some cases, even better with the reduced feature set. The proposed method is capable of identifying the most essential feature set from any new dataset for predicting students' performance.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals