A Hybrid Approach for Class Imbalance Problem in Customer Churn Prediction: A Novel Extension to Under-sampling

PDF (584KB), PP.71-81

Views: 0 Downloads: 0

Author(s)

Uma R. Salunkhe 1,* Suresh N. Mali 2

1. Smt. Kashibai Navale College of Engineering, Savitribai Phule Pune University, Pune, 411041, India

2. Sinhgad Institute of Technology and Science, Pune, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2018.05.08

Received: 12 May 2017 / Revised: 16 Jun. 2017 / Accepted: 6 Jul. 2017 / Published: 8 May 2018

Index Terms

Imbalanced data, Re-sampling, Under-sampling, Classifier ensemble, Churn prediction

Abstract

Customer retention is becoming a key success factor for many business applications due to increasing market competition. Especially telecom companies are facing this challenge with a rapidly increasing number of service providers. Hence there is need to focus on customer churn prediction in order to detect the customers that are likely to churn i.e. switch from one service provider to another. Several data mining techniques are applied for classifying customers into the churn and non-churn category. But churn prediction applications comprise an imbalanced distribution of the dataset.
One of the commonly used techniques to handle imbalanced data is re-sampling of data as it is independent of the classifier being used. In this paper, we develop a hybrid re-sampling approach named SOS-BUS by combining well known oversampling technique SMOTE with our novel under-sampling technique. Our methodology aims to focus on the necessary data of majority class and avoid their removal in order to overcome the limitation of random under-sampling. Experimental results show that the proposed approach outperforms the other reference techniques in terms of Area under ROC Curve (AUC).

Cite This Paper

Uma R. Salunkhe, Suresh N. Mali, "A Hybrid Approach for Class Imbalance Problem in Customer Churn Prediction: A Novel Extension to Under-sampling", International Journal of Intelligent Systems and Applications(IJISA), Vol.10, No.5, pp.71-81, 2018. DOI:10.5815/ijisa.2018.05.08

Reference

[1]S. Y. Hung, D. C. Yen, and H. Y. Wang, “Applying data mining to telecom churn management,” Expert Systems with Applications, vol. 31, no. 3, pp. 515-524, 2006.
[2]W. Verbeke, D. Martens, C. Mues, and B. Baesens, “Building comprehensible customer churn prediction models with advanced rule induction techniques,” Expert Systems with Applications, vol. 38, no. 3, pp. 2354-2364, 2011.
[3]M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 4, pp. 463-484, 2012.
[4]Y. Park, and J. Ghosh, “Ensembles of $({\ alpha}) $-Trees for Imbalanced Classification Problems,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 131-143, 2014.
[5]P. Cao, J. Yang, W. Li, D. Zhao, and O. Zaiane, “Ensemble-based hybrid probabilistic sampling for imbalanced data learning in lung nodule CAD,” Computerized Medical Imaging and Graphics, vol. 38, no. 3, pp. 137-150, 2014.
[6]J. Burez, and D. Van den Poel, “Handling class imbalance in customer churn prediction,” Expert Systems with Applications, vol. 36, no. 3, pp. 4626-4636, 2009.
[7]W. Verbeke, K. Dejaeger, D. Martens, J. Hur, and B. Baesens, “New insights into churn prediction in the telecommunication sector: A profit driven data mining approach,”. European Journal of Operational Research, vol. 218, no. 1, pp. 211-229, 2012.
[8]K. Kim, C. H. Jun, and J. Lee, “Improved churn prediction in telecommunication industry by analyzing a large network,” Expert Systems with Applications, vol. 41, no. 15, pp. 6575-6584, 2014.
[9]T. Vafeiadis, K. I. Diamantaras, G. Sarigiannidis, and K. C. Chatzisavvas, “A comparison of machine learning techniques for customer churn prediction,” Simulation Modelling Practice and Theory, vol. 55, pp. 1-9, 2015.
[10]M. A. Tunga, and A. Karahoca, “Detecting GSM churners by using Euclidean Indexing HDMR,” Applied Soft Computing, vol. 27, pp. 38-46, 2015.
[11]A. Amin, S. Anwar, A. Adnan, M. Nawaz, K. Alawfi, A. Hussain, and K. Huang, “Customer churn prediction in the telecommunication sector using a rough set approach,” Neurocomputing, vol. 237, pp. 242-254, 2017.
[12]W. Bi, M. Cai, M. Liu, and G. Li, “A big data clustering algorithm for mitigating the risk of customer churn,” IEEE Transactions on Industrial Informatics, vol. 12, no. 3, pp. 1270-1281, 2016.
[13]N. Lu, H. Lin, J. Lu, and G. Zhang, “A customer churn prediction model in telecom industry using boosting,” IEEE Transactions on Industrial Informatics, vol. 10, no. 2, pp. 1659-1665, 2014.
[14]W. H. Au, K. C. Chan, and X. Yao, “A novel evolutionary data mining algorithm with applications to churn prediction,” IEEE transactions on evolutionary computation, vol. 7, no. 6, pp. 532-545, 2003.
[15]B. Zhu, B. Baesens, and S. K. vanden Broucke, “An empirical comparison of techniques for the class imbalance problem in churn prediction,” Information Sciences, vol. 408, pp. 84-99, 2017.
[16]T. Verbraken, W. Verbeke, and B. Baesens, “A novel profit maximizing metric for measuring classification performance of customer churn prediction models,” IEEE transactions on knowledge and data engineering, vol. 25, no. 5, pp. 961-973, 2013.
[17]A. A. Ahmed, and D. Maheswari, “Churn prediction on huge telecom data using hybrid firefly based classification,” Egyptian Informatics Journal, 2017.
[18]A. Amin, S. Anwar, A. Adnan, M. Nawaz, N. Howard, J. Qadir, A. Hawalah, and A. Hussain, “Comparing oversampling techniques to handle the class imbalance problem: A customer churn prediction case study,” IEEE Access, vol. 4, pp. 7940-7957, 2016.
[19]S. Hu, Y. Liang, L. Ma, and Y. He, “October. MSMOTE: improving classification performance when training data is imbalanced,” In Computer Science and Engineering, 2009. WCSE'09. Second International Workshop on , vol. 2, pp. 13-17, IEEE, 2009.
[20]H. Cao, V. Y. Tan, and J. Z. Pang, “A parsimonious mixture of Gaussian trees model for oversampling in imbalanced and multimodal time-series classification,” IEEE transactions on neural networks and learning systems, vol. 25, no. 12, pp. 2226-2239, 2014.
[21]N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321-357, 2002.
[22]H. Han, W. Y. Wang, and B. H. Mao, “Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning,” Advances in intelligent computing, pp. 878-887, 2005.
[23]G. E. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM Sigkdd Explorations Newsletter, vol. 6, no. 1, pp. 20-29, 2004.
[24]I. Tomek, “Two modifications of CNN,” IEEE Trans. Systems, Man and Cybernetics, vol. 6, pp. 769-772, 1976.
[25]U. R. Salunkhe, and S. N. Mali, “Classifier Ensemble Design for Imbalanced Data Classification: A Hybrid Approach,” Procedia Computer Science, vol. 85, pp. 725-732, 2016.
[26]G. Wang, J. Ma, and S. Yang, “An improved boosting based on feature selection for corporate bankruptcy prediction,” Expert Systems with Applications, vol. 41, no. 5, pp. 2353-2361, 2014.
[27]Doaa Hassan,"The Impact of False Negative Cost on the Performance of Cost Sensitive Learning Based on Bayes Minimum Risk: A Case Study in Detecting Fraudulent Transactions", International Journal of Intelligent Systems and Applications(IJISA), Vol.9, No.2, pp.18-24, 2017. DOI: 10.5815/ijisa.2017.02.03
[28]C.Bhanuprakash, Y.S. Nijagunarya, M.A. Jayaram,"Clustering of Faculty by Evaluating their Appraisal Performance by using Feed Forward Neural Network Approach", International Journal of Intelligent Systems and Applications(IJISA), Vol.9, No.3, pp.34-40, 2017. DOI: 10.5815/ijisa.2017.03.05
[29]X. Y. Liu, J. Wu, and Z. H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 2, pp. 539-550, 2009.
[30]L. Abdi, and S. Hashemi, “To combat multi-class imbalanced problems by means of over-sampling techniques,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 1, pp. 238-251, 2016.
[31]A. I. Marqués, V. García, and J. S. Sánchez, “Two-level classifier ensembles for credit risk assessment,” Expert Systems with Applications, vol. 39, no. 12, pp. 10916-10922, 2012.