International Journal of Information Engineering and Electronic Business(IJIEEB)

ISSN: 2074-9023 (Print), ISSN: 2074-9031 (Online)

Published By: MECS Press

IJIEEB Vol.13, No.6, Dec. 2021

An Efficient Smote-based Model for Dyslexia Prediction

Full Text (PDF, 527KB), PP.13-21

Views:1   Downloads:0


Vani Chakraborty, Meenatchi Sundaram

Index Terms

Dyslexia, gamified test, SMOTE, ADASYN, Near-miss.


Dyslexia is a learning disability which causes difficulty in an individual to read, write and spell and do simple mathematical calculations. It affects almost 10% of the global population and detecting it early is paramount for its effective handling. There are many different methods to detect the risk of Dyslexia. Some of these methods are using assessment tools, handwriting recognition, expert psychological help and also using the eye movement data recorded while reading. One of the other convenient and easy ways of detecting risk of dyslexia is to make an individual participate in a simple game related to phonological awareness, syllabic awareness, auditory discrimination, lexical awareness, visual working memory, and many more and recording the observations. The proposed research work presents an effective way of predicing the risk of dyslexia with high accuracy and reliability. It uses a dataset made available from the kaggle repository to predict the risk of dyslexia using various machine learning algorithms. Also it is observed that the dataset has an unequal distribution of positive and negative cases and so the classification accuracy is compromised if used directly. The proposed research work uses three resampling techniques to reduce the imbalance in the dataset. The resampling techniques used are undersampling using near-miss algorithm, oversampling using SMOTE and ADASYN. After applying the undersampling near-miss algorithm, best accuracy was given by SVC classifier with the value of 81.63%. All the other classifiers used in the experiment produced accuracy in the range of 64% to 79.08%. After using the oversampling algorithm SMOTE, the classifiers produced very good results in the evaluation metrics of accuracy,CV score, F1 Score and recall. The maximum accuracy was given by RandomForest with a value of 96.37% and closely followed by XGBBoosting and GradientBoosting with an accuracy of 95.14%. Decision tree, SVC and ADABoost got an accuracy of 91.26%, 93.36% and 93.48% respectively. Even the values of CV score, F1 and recall were considerably high for all these classifiers. After applying the oversampling technique of ADASYN, RandomForest algorithm generated maximum accuracy of 96.25%. Between the two oversampling techniques, SMOTE algorithm performed slightly better in producing better evaluation metrics than ADASYN. The proposed system has very high reliability and so can be effectively used for detecting the risk of dyslexia.  

Cite This Paper

Vani Chakraborty, Meenatchi Sundaram, "An Efficient Smote-based Model for Dyslexia Prediction", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.13, No.6, pp. 13-21, 2021. DOI: 10.5815/ijieeb.2021.06.02


[1]Kohli, A., Sharma, S., & Padhy, S. K. (2018). Specific Learning Disabilities: Issues that Remain Unanswered. Indian journal of psychological medicine, 40(5), 399–405.

[2]Dyslexia at a glance, International Dyslexia Association, [ Last accessed on August 17th, 2021]. Available from

[3]O. L. Usman, R. C. Muniyandi, K. Omar and M. Mohamad, "Advance Machine Learning Methods for Dyslexia Biomarker Detection: A Review of Implementation Details and Challenges," in IEEE Access, vol. 9, pp. 36879-36897, 2021, doi: 10.1109/ACCESS.2021.3062709.

[4]Mandatory tests for Dyslexia, Dyslexia Association of India, [ Last accessed on August 17, 2021]. Available from

[5]H. M. Al-Barhamtoshy and D. M. Motaweh, "Diagnosis of Dyslexia using computation analysis," 2017 International Conference on Informatics, Health & Technology (ICIHT), 2017, pp. 1-7, doi: 10.1109/ICIHT.2017.7899141.

[6]Luz Rello and Miguel Ballesteros. 2015. Detecting readers with dyslexia using machine learning with eye tracking measures. In Proceedings of the 12th International Web for All Conference (W4A '15). Association for Computing Machinery, New York, NY, USA, Article 16, 1–8. DOI:

[7]Rello L, Baeza-Yates R, Ali A, Bigham JP, Serra M (2020) Predicting risk of dyslexia with an online gamified test. PLoS ONE 15(12): e0241687. 

[8]Dr.A.V. Pradeep Kumar, Dr.G. Vamsi Krishna, Dr.K.Satish Kumar(2018). Diagnosis Children with Dyslexia using Machine Learning technique. International Journal of Pure and Applied Mathematics: Volume 120 No 6, 7305-7320

[9]David, Julie & Balakrishnan, Kannan. (2010). Machine Learning Approach for Prediction of Learning Disabilities in School-Age Children. International Journal of Computer Applications. 9. 10.5120/1432-1931

[10]Nilsson Benfatto M, Öqvist Seimyr G, Ygge J, Pansell T, Rydberg A, Jacobson C (2016) Screening for Dyslexia Using Eye Tracking during Reading. PLoS ONE 11(12): e0165508.

[11]A. Jothi Prabha and R. Bhargavi, ‘‘Predictive model for Dyslexia from fixations and saccadic eye movement events,’’ Comput. Methods Programs Biomed., vol. 195, Oct. 2020, Art. no. 105538, doi: 10.1016/j.cmpb.2020.105538.

[12]Liu, Shigang & Zhang, Jun & Xiang, Yang & Zhou, Wanlei & Xiang, Dongxi. (2020). A study of data pre-processing techniques for imbalanced biomedical data classification. International Journal of Bioinformatics Research and Applications. 16. 290. 10.1504/IJBRA.2020.109103

[13]Suresh, K., Thomas, S. V., & Suresh, G. (2011). Design, data analysis and sampling techniques for clinical research. Annals of Indian Academy of Neurology, 14(4), 287–290.

[14]Letteri, Ivan & Cecco, Antonio & Dyoub, Abeer & Penna, Giuseppe. (2020). A Novel Resampling Technique for Imbalanced Dataset Optimization

[15]Wenhao Xie, Gongqian Liang, Zhonghui Dong, Baoyu Tan, Baosheng Zhang, "An Improved Oversampling Algorithm Based on the Samples’ Selection Strategy for Classifying Imbalanced Data", Mathematical Problems in Engineering, vol. 2019, Article ID 3526539, 13 pages, 2019.

[16]Dataset available at the location :

[17]Mukherjee, M.; Khushi, M. SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features. Appl. Syst. Innov. 2021, 4, 18.

[18]Taha Muthar Khan, Shengjun Xu, Zullatun Gull Khan, Muhammad Uzair chishti, "Implementing Multilabeling, ADASYN, and ReliefF Techniques for Classification of Breast Cancer Diagnostic through Machine Learning: Efficient Computer-Aided Diagnostic System", Journal of Healthcare Engineering, vol. 2021, Article ID 5577636, 15 pages, 2021.

[19]Nhlakanipho Michael Mqadi, Nalindren Naicker, Timothy Adeliyi, "Solving Misclassification of the Credit Card Imbalance Problem Using Near Miss", Mathematical Problems in Engineering, vol. 2021, Article ID 7194728, 16 pages, 2021.

[20]Uma R. Salunkhe, Suresh N. Mali, "A Hybrid Approach for Class Imbalance Problem in Customer Churn Prediction: A Novel Extension to Under-sampling", International Journal of Intelligent Systems and Applications(IJISA), Vol.10, No.5, pp.71-81, 2018. DOI: 10.5815/ijisa.2018.05.08

[21]A. Fernández, S. Garcia, F. Herrera, and N. V. Chawla, “SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary,” Journal of Artificial Intelligence Research, vol. 61, pp. 863–905, 2018.