IJISA Vol. 17, No. 1, 8 Feb. 2025
Cover page and Table of Contents: PDF (size: 554KB)
PDF (554KB), PP.88-97
Views: 0 Downloads: 0
Cardiovascular, Data Transformation, Ensemble Learning, Heart Disease, Machine Learning
About one person dies every minute from cardiovascular disease; consequently, it has almost surpassed war as the largest cause of death in the twenty-first century. In cardiology, early and accurate diagnosis of heart illness is a cornerstone of effective healthcare. Predictive analytics, which involves machine-learning algorithms, can be a great option for contributing towards the early detection of cardiovascular disease. This study evaluates the data preprocessing techniques involved in building machine learning models to predict cardiovascular disease and identify the features contributing to the cardio attack. A novel data transformation technique named the superlative boundary binning method was proposed to enhance machine learning and ensemble learning classification models for predicting cardiac illness based on independent physiological feature parameters. The results revealed that the ensemble learning classifier AdaBoost using the superlative boundary binning method has performed well with a classification accuracy of 93% when compared with the other data transformation and machine learning classifier models.
J. Cruz Antony, E. Murali, D. Deepa, R. Vignesh, S. Hemalatha, Umme Fahad, "Data Transformation and Predictive Analytics of Cardiovascular Disease Using Machine and Ensemble Learning Techniques", International Journal of Intelligent Systems and Applications(IJISA), Vol.17, No.1, pp.88-97, 2025. DOI:10.5815/ijisa.2025.01.06
[1]Wacker-Gussmann, Annette, and Renate Oberhoffer-Fritz, “Cardiovascular risk factors in childhood and adolescence,” Journal of Clinical Medicine, Vol. 11, no. 4, pp. 1136, 2022.
[2]Roth, Gregory A., et al. "Global burden of cardiovascular diseases and risk factors, 1990–2019: update from the GBD 2019 study." Journal of the American college of cardiology 76.25 (2020): 2982-3021.
[3]Oleg Gaidai, Yu Cao, Stas Loginov, “Global Cardiovascular Diseases Death Rate Prediction”, Current Problems in Cardiology, Volume 48, Issue 5, 2023, 101622, ISSN 0146-2806, https://doi.org/10.1016/j.cpcardiol.2023.101622
[4]K. Vanisree, S. Jyothi, “Decision Support System for Congenital Heart Disease Diagnosis based on Signs and Symptoms using Neural Networks,” International Journal of Computer Applications, Vol. 19, no. 6, pp. 6-12, 2011.
[5]S.F. Weng, J. Reps, J. Kai, J.M. Garibaldi, N. Qureshi, “Can Machine-Learning Improve Cardiovascular Risk Prediction Using Routine Clinical Data,” PLOS, Vol. 1, no.12, 2017.
[6]M. Thiyagaraj, G. Suseendran, “Survey on heart disease prediction system based on data mining techniques,” Indian Journal of Innovations and Developments, Vol. 6, no. 1, pp.1-9 2017.
[7]C.S. Dangare, S.S. Apte, “Improved Study of Heart Disease Prediction System using Data Mining Classification Techniques,” International Journal of Computer Applications, Vol. 47, no. 10, pp.44-48, 2012
[8]S. Palaniappan, R. Awang, “Intelligent heart disease prediction system using data mining techniques,” IEEE/ACS International Conference on computer systems and Applications, pp.108-115, 2008.
[9]Pal, Madhumita, Smita Parija, Ganapati Panda, Kuldeep Dhama, and Ranjan K. Mohapatra, “Risk prediction of cardiovascular disease using machine learning classifiers,” Open Medicine, Vol. 17, no. 1, pp. 1100-1113 2022.
[10]Shah, Devansh, Samir Patel, and Santosh Kumar Bharti, “Heart disease prediction using machine learning techniques,” SN Computer Science. Vol. 1, pp.1-6, 2020.
[11]C.B.C. Latha, S.C. Jeeva, “Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques,” Informatics in Medicine, pp. 100203, 2019.
[12]Raid Lafta, Ji Zhang, Xiaohui Tao, Yan Li, Xiaodong Zhu, Yonglong Luo and Fulong Chen, “Coupling a Fast Fourier Transformation with a Machine Learning Ensemble Model to Support Recommendations for Heart Disease Patients in a Telehealth Environment,” IEEE, pp. 10674-10685, 2017.
[13]K Tsarapatsani, V Pezoulas, A Sakellarios, V Tsakanikas, W Marz, M Kleber, L Michalis, D Fotiadis, Prediction of all-cause mortality in cardiovascular patients by using machine learning models, European Heart Journal, Volume 43, Issue Supplement_2, October 2022, ehac544.1185, https://doi.org/10.1093/eurheartj/ehac544.1185
[14]Rahmanul Hoque, Masum Billah, Amit Debnath, S. M. Saokat Hossain and Numair Bin Sharif, Heart Disease Prediction using SVM, International Journal of Science and Research Archive, 2024, 11(02), 412–420
[15]Oleg Gaidai, Yu Cao, Stas Loginov, “Global Cardiovascular Diseases Death Rate Prediction”, Current Problems in Cardiology, Volume 48, Issue 5, 2023, 101622, ISSN 0146-2806, https://doi.org/10.1016/j.cpcardiol.2023.101622.
[16]K Tsarapatsani, V Pezoulas, A Sakellarios, V Tsakanikas, W Marz, M Kleber, L Michalis, D Fotiadis, Prediction of all-cause mortality in cardiovascular patients by using machine learning models, European Heart Journal, Volume 43, Issue Supplement_2, October 2022, ehac544.1185, https://doi.org/10.1093/eurheartj/ehac544.1185
[17]Rahmanul Hoque, Masum Billah, Amit Debnath, S. M. Saokat Hossain and Numair Bin Sharif, Heart Disease Prediction using SVM, International Journal of Science and Research Archive, 2024, 11(02), 412–420
[18]David, Andrew, “A. Introducing Python programming into undergraduate biology,” The American Biology Teacher, 2021.
[19]Stiawan, Deris, Mohd Yazid Bin Idris, Alwi M. Bamhdi, and Rahmat Budiarto, “CICIDS-2017 dataset feature analysis with information gain for anomaly detection,” IEEE Access 8, pp. 132911-132921, 2020.
[20]Cilia, Nicole Dalia, Claudio De Stefano, Francesco Fontanella, and Alessandra Scotto di Freca, “A ranking-based feature selection approach for handwritten character recognition,” Pattern Recognition Letters, Vol. 121, pp. 77-86, 2019.
[21]Liu, Haoyue, MengChu Zhou, Xiaoyu Sean Lu, and Cynthia Yao., “Weighted Gini index feature selection method for imbalanced data,” 2018 IEEE 15th international conference on networking, sensing and control (ICNSC), pp. 1-6, 2018.
[22]Trivedi, Shrawan Kumar, “A study on credit scoring modeling with different feature selection and machine learning approaches,” Technology in Society, pp. 101413, 2020.
[23]E. Murali and S. Margret Anouncia, "Visualization of Multiple Ontology Agro Knowledge Mining Model", International Journal of Reliability, Quality and Safety Engineering, Vol. 29, No. 05, 2022.
[24]M. Pratheepa, and J. Cruz Antony, "Outlook of various soft computing data preprocessing techniques to study the pest population dynamics in integrated pest management.", in Soft Computing for Biological Systems, H. Purohit, V. Kalia, R. More, Eds., Singapore: Springer, 2018, pp. 187-200.
[25]Tsai, Chih-Fong, and Yu-Chi Chen, “The optimal combination of feature selection and data discretization: An empirical study,” Information Sciences, pp. 282-293, 2019.
[26]May, Robert J., Holger R. Maier, and Graeme C. Dandy, “Data splitting for artificial neural networks using SOM-based stratified sampling,” Neural Networks, pp. 283-294, 2010.
[27]Bayrak, Ebru Aydındag, Pınar Kırcı, and Tolga Ensari, “Comparison of machine learning methods for breast cancer diagnosis,” 2019 Scientific meeting on electrical-electronics & biomedical engineering and computer science (EBBT), pp. 1-3, 2019.
[28]Faruque, Md Faisal, and Iqbal H. Sarker, “Performance analysis of machine learning techniques to predict diabetes mellitus,” 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 1-4, 2019.
[29]Behar, Nishant, and Manish Shrivastava, "A Novel Model for Breast Cancer Detection and Classification." Engineering, Technology & Applied Science Research Vol. 12, no.6, pp. 9496-9502, 2022.
[30]Dong, Xibin, Zhiwen Yu, Wenming Cao, Yifan Shi, and Qianli Ma, “A survey on ensemble learning,” Frontiers of Computer Science, pp. 241-258, 2020.
[31]Matloob, Faseeha, Taher M. Ghazal, Nasser Taleb, Shabib Aftab, Munir Ahmad, Muhammad Adnan Khan, Sagheer Abbas, and Tariq Rahim Soomro, “Software defect prediction using ensemble learning: A systematic literature review,” IEEE Access 9, pp. 98754-98771, 2021.
[32]Fitriyani, Norma Latif, Muhammad Syafrudin, Ganjar Alfian, and Jongtae Rhee, “Development of disease prediction model based on ensemble learning approach for diabetes and hypertension,” IEEE Access, pp. 144777-144789, 2019.
[33]Zeng, Guoping, “On the confusion matrix in credit scoring and its analytical properties,” Communications in Statistics-Theory and Methods, pp. 2080-2093, 2020.
[34]Ajagbe, Sunday Adeola, Kamorudeen A. Amuda, Matthew A. Oladipupo, F. AFE Oluwaseyi, and Kikelomo I. Okesola, “Multiclassification of Alzheimer disease on magnetic resonance images (MRI) using deep convolutional neural network (DCNN) approaches,” International Journal of Advanced Computer Research, Vol. 51, 2021.
[35]J. Cruz Antony, and M. Pratheepa, “Study of population dynamics of soybean semilooper Gesonia gemma Swinhoe by using rule induction model in Maharashtra, India”, Legume Research-An International Journal, pp. 369-373, 2017.
[36]Tang, Jiayi, Alex Henderson, and Peter Gardner, “Exploring AdaBoost and Random Forests machine learning approaches for infrared pathology on unbalanced datasets,” Analyst, pp. 5880-5891, 2021.
[37]Shanmugasundar, G., M. Vanitha, Robert Čep, Vikas Kumar, Kanak Kalita, and M. Ramachandran, “A comparative study of linear, random forest and AdaBoost regressions for modeling nontraditional machining,” Processes. 2021.
[38]Abdi, Hervé, and Lynne J. Williams. "Tukey’s honestly significant difference (HSD) test." Encyclopedia of Research Design 3.1 (2010): 1-5.
[39]García, S., Fernández, A., Luengo, J., & Herrera, F. (2009). A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Computing, 13, 959-977.