IJITCS Vol. 9, No. 7, 8 Jul. 2017
Cover page and Table of Contents: PDF (size: 620KB)
Full Text (PDF, 620KB), PP.61-68
Views: 0 Downloads: 0
ChiMerge Discretization, Feature Selection, Median Based Discretization, Naive Bayesian Classifier, Predictive Accuracy and Relevant Features
Most of the data mining and machine learning algorithms will work better with discrete data rather than continuous. But the real time data need not be always discrete and thus it is necessary to discretize the continuous features. There are several discretization methods available in the literature. This paper compares the two methods Median Based Discretization and ChiMerge discretization. The discretized values obtained using both methods are used to find the feature relevance using Information Gain. Using the feature relevance, the original features are ranked by both methods and the top ranked attributes are selected as the more relevant ones. The selected attributes are then fed into the Naive Bayesian Classifier to determine the predictive accuracy. The experimental results clearly show that the performance of the Naive Bayesian Classifier has improved significantly for the features selected using Information Gain with Median Based Discretization than Information Gain with ChiMerge discretization.
P.Kalpana, K.Mani, "An Exploratory Analysis between the Feature Selection Algorithms IGMBDand IGChiMerge", International Journal of Information Technology and Computer Science(IJITCS), Vol.9, No.7, pp.61-68, 2017. DOI:10.5815/ijitcs.2017.07.07
[1]Rajashree Dash, Rajib Lochan Paramguru and Rasmita Dash, "Comparative Analysis of Supervised and Unsupervised Discretization Techniques", International Journal of Advances in Science and Technology, vol. 2, no. 3, pp. 29-37, 2011.
[2]K. Mani and P. Kalpana, "A Filter-based Feature Selection using Information Gain with Median Based Discretization for Naive Bayesian Classifier", International Journal of Applied and Engineering Research, vol. 10, no.82, pp. 280-285, 2015.
[3]James Dougherty, Ron Kohavi and Mehran Sahami, "Supervised and Unsupervised Discretization of Continuous Features (Published Conference Proceedings style)", In Proceedings of the 12th International Conference, Morgan Kaugmann Publishers, vol. 25, pp.194-202, 1995..
[4]Ke Wang and Han Chong Goh, "Minimum Splits Based Discretization for Continuous Features", IJCAI, vol. 2, pp. 942-951, 1997.
[5]Salvador Garcia, Julian Luengo, Jose Antonio Saez, Victoria Lopez and Francisco Herrera, "A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning", IEEE transactions on Knowledge and Data Engineering, vol. 25, no. 4, pp.734-750, 2013.
[6]Randy Kerber, "ChiMerge: Discretization of Numeric Attributes (Published Conference Proceedings style)", In Proceedings of the tenth national conference on Artificial Intelliegence, Aaai press, pp. 123-128, 1992.
[7]Arezoo Aghaei Chadegani and Davood Poursina, "An examination of the effect of discretization on a naïve Bayes model's performance", Scientific Research and Essays, vol. 8, no. 44, pp. 2181-2186, 2013.
[8]Derex D.Rucker, Blakeley B.McShane and Kristopher J.Preacher, "A Research 's guide to regression, discretization and median splits of continuous variables", Journal of Consumer Psychology, Elsevier, vol. 25, no. 4, pp. 666-668, 2015.
[9]UCI Machine Learning Repository - Center for Machine Learning and Intelligent System, Available: http://archive.ics.uci.edu.
[10]Daniela Joiţa, "Unsupervised Static Discretization Methods in Data Mining", Titu Maiorescu University, Bucharest, Romania, 2010.
[11]Jiawei Han, Jian Pei, and Micheline Kambar, "Data Mining: Concepts and Techniques", 3rd edition, Elsevier, 2011.
[12]H.Liu, F.Hussain, C.L.Tan, and M.Dash, "Discretization: An Enabling Technique", Data Mining and Knowledge Discovery, vol. 6, no. 4, pp. 393-423, 2002.
[13]Jerzy W. Grzymala-Busse, "Discretization Based on Entropy and Multiple Scanning", Entropy, vol. 15, no. 5, pp.1486-1502, 2013.
[14]Ying Yang and Geoffrey I. Webb, "A Comparative Study of Discretization Methods for Naive-Bayes Classifiers (Published Conference Proceedings style)", In Proceedings of PKAW, vol. 2002, pp. 159-173, 2002.
[15]Nuntawut Kaoungku, Phatcharawan Chinthaisong, Kittisak Kerdprasop, and Nittaya Kerdprasop, "Discretization and Imputation Techniques for Quantitative Data Mining (Published Conference Proceedings style)", In Proceedings of International MultiConference of Engineers and Computer Scientists, MECS, vol. 1, 2013.
[16]Prachya Pongaksorn, Thanawin Rakthanmanon, and Kitsana Waiyamai "DCR: Discretization using Class Information to Reduce Number of Intervals", Quality issues, measures of interestingness and evaluation of data mining models (QIMIE’09), pp.17-28, 2009.
[17]K.Mani, P.Kalpana, "An Efficient Feature Selection based on Bayes Theorem, Self Information and Sequential Forward Selection", International Journal of Information Engineering and Electronic Business(IJIEEB), vol.8, no.6, pp.46-54, 2016. DOI: 10.5815/ijieeb.2016.06.06.
[18]Saptarsi Goswami, Amlan Chakrabarti, "Feature Selection: A Practitioner View", International Journal of Information Technology and Computer Science, vol.11, pp.66-77, 2014. DOI: 10.5815/ijitcs.2014.11.10