IJISA Vol. 7, No. 6, 8 May 2015
Cover page and Table of Contents: PDF (size: 910KB)
Full Text (PDF, 910KB), PP.1-20
Views: 0 Downloads: 0
Concept Drift, Ensemble, Homogeneity, Data Streams, Online Approaches
Various types of online learning algorithms have been developed so far to handle concept drift in data streams. We perform more detailed evaluation of these algorithms through new performance metrics - prequential accuracy, kappa statistic, CPU evaluation time, model cost, and memory usage. Experimental evaluation using various artificial and real-world datasets prove that the various concept drifting algorithms provide highly accurate results in classifying new data instances even in a resource constrained environment, irrespective of size of dataset, type of drift or presence of noise in the dataset. We also present empirically the impact of various features- size of ensemble, period value, threshold value, multiplicative factor and the presence of noise on all the key performance metrics.
Parneeta Sidhu, M.P.S. Bhatia, "Empirical Support for Concept Drifting Approaches: Results Based on New Performance Metrics", International Journal of Intelligent Systems and Applications(IJISA), vol.7, no.6, pp.1-20, 2015. DOI:10.5815/ijisa.2015.06.01
[1]A. Bifet, G. Holmes, R. Kirkby and B. Pfahringer, “MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering”, Workshop on Applications of Pattern Analysis, JMLR: Workshop and Conference Proceedings 11 (2010) 44.
[2]A. Blum, “Empirical Support for Winnow and Weighted Majority Algorithms: Results on a Calendar Scheduling Domain”, Machine Learning, Kluwer Academic Publisher. (1997)Boston.
[3]A. Dawid and V. Vovk, “Prequential Probability: Principles and Properties”, Bernoulli, vol. 5, no. 1, pp. 125-162, 1999.
[4]A. Narasimhamurthy, and L.I. Kuncheva, “A framework for generating data to simulate changing environments, “in Proceedings of the 25th IASTED AIA, Innsbruck, Austria, 2007, pp. 384-389.
[5]C. Blake and C. Merz, “UCI Repository of machine learning databases”, Department of Information and Computer Sciences, University of California, Irvine, 1998 http://www.ics.uci.edu/~mlearn/MLRepository.html.
[6]F.L. Minku, H. Inoue and X. Yao, “Negative correlation in incremental learning”, Natural Computing Journal., Special Issue on nature-Inspired Learning and Adaptive Systems, 2009, vol. 8, no. 2, pp. 289-320.
[7]F.L. Minku and X.Yao, “Using diversity to handle concept drift in on-line learning “, In Proc. Int’l Joint Conf. Neural Networks (IJCNN), 2009, pp. 2125-2132.
[8]G. Hulten, L. Spencer and P. Domingos, “Mining time-changing data streams”. In KDD’01, ACM Press, San Francisco, CA, 2001, pages 97–106.
[9]J. Gama, P. Medas, G. Castillo and P. Rodrigues, “Learning with drift detection”, In Proceedings Seventh Brazilian Symposium Artificial Intelligence (SBIA ’04), pp. 286-295.
[10]J. Gama, R. Sebastião and P.P. Rodrigues, “Issues in evaluation of stream learning algorithms”, In KDD’09, pages 329–338.
[11]J.Gao, W. Fan, and J. Han, “On appropriate assumptions to mine data streams: analysis and practice”, In Proc. IEEE Int’l Conf. Data Mining (ICDM), 2007, pp. 143-152.
[12]J. Z. Kolter and M.A. Maloof, “Dynamic weighted majority: A new ensemble method for tracking concept drift”, In Proceedings of the 3rd ICDM, USA, 2003, pp. 123-130.
[13]J. Z. Kolter and M.A. Maloof” Using additive expert ensembles to cope with concept drift”. In Proceedings of the Twenty Second ACM International Conference on Machine Learning (ICML’05), Bonn, Germany, pp. 449–456.
[14]J.Z. Kolter and M.A. Maloof, “Dynamic weighted majority: An ensemble method for drifting concepts”, JMLR (2007)8: 2755–2790.
[15]J.C. Schlimmer and R.H. Granger,” Incremental learning from noisy data”, Machine. Learning, 1986, vol.1, no.3, pp 317-354.
[16]J.C. Schlimmer and R.H. Granger, “ Beyond incremental processing: Tracking concept drift”, In Proc. of 5th National Conference on Artificial Intelligence, AAAI Press, CA, 1986, pp.502–507.
[17]K. Nishida and K. Yamauchi,” Adaptive classifiers-ensemble system for tracking concept drift”, In Proceedings of the Sixth International Conference on Machine Learning and Cybernetics (ICMLC’07), Honk Kong,2007a, pp. 3607–3612.
[18]K. Nishida and K. Yamauchi,” Detecting concept drift using statistical testing”, In Proceedings of the Tenth International Conference on Discovery Science (DS’07) - Lecture Notes in Artificial Intelligence, Vol. 3316, Sendai, Japan,2007, pp. 264–269.
[19]K.O. Stanley,” Learning concept drift with a committee of decision trees”, Technical Report UT-AI-TR-03-302, (2003) Department of Computer Sciences, University of Texas at Austin, Austin, USA.
[20]L.I. Kuncheva and C.J. Whitaker, “Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy”, Machine Learning, (2003) vol. 51, pp. 181–207.
[21]M. Baena-Garcı´a, J.D. Campo-Avila, R. Fidalgo and A. Bifet, “Early Drift Detection Method”, In Proc. 4th ECML PKDD Int’l Workshop Knowledge Discovery from Data Streams, 2006, pages 77-86.
[22]M. Harries, “Splice-2 comparative evaluation: Electricity pricing”, Technical report, University of New South Wales, Australia, July 1999.
[23]M.M. Masud, Q. Chen, J. Gao , L. Khan, J. Han et al., “ Classification and novel class detection of data streams in a dynamic feature space”, In Proceedings of ECML/PKDD (2) 2010: 337-352
[24]M.M. Masud, Q. Chen, J. Gao , L. Khan, C. Aggarwal et al.,” Addressing concept-evolution in concept-drifting data streams “,IEEE International Conference on Data Mining, Sydney, Australia, December 2010, 929-934
[25]N. Littlestone and M.K. Warmuth, “The weighted majority algorithm”, Information and Computation, (1994), 108(2):212–261.
[26]P. Domingos and G. Hulten , “ Mining high-speed data streams”, In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, New York, NY. (2000), pages 71–80.
[27]P.M. Murphy, “UCI Repository of machine learning databases”, Department of Information and Computer Sciences, University of California, Irvine, available at http://www.ics.uci.edu/~mlearn/.
[28]T.G. Dietterich,”Machine learning research: Four current directions”, Artificial Intelligence, (1997) vol. 18, no. 4, pp. 97–136.
[29]T.G. Dietterich,” An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization”, Machine Learning, (2000), vol. 40, no. 2, pp. 139–157.
[30]W.N. Street and Y. Kim, “ A streaming ensemble algorithm (SEA) for large-scale classification”, In Proc. of 7th ACM SIGKDD Intl. Conf., ACM Press, NY, 2001, pages 377–382.
[31]L. L. Minku and X.Yao, “DDD: A New Ensemble Approach for Dealing with Concept Drift”, IEEE Transactions on Knowledge and Data Engineering, VOL. 24, No. 4, 619, 2012.
[32]F.L. Minku, A. White and X. Yao , “ The Impact of Diversity on On-Line Ensemble Learning in the Presence of Concept Drift”, IEEE Trans. Knowledge and Data Engineering, vol. 22,no. 5, pp. 730-742, May 2010.
[33]L.I. Kuncheva and C.J. Whitaker, Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy, Machine Learning, vol. 51, pp. 181–207, 2003.
[34]W. LinLin and C. Yunfang,” Diversity Based on Entropy: A Novel Evaluation Criterion in Multi-objective Optimization Algorithm”, I.J. Intelligent Systems and Applications, vol.4, no. 10, 2012, pp. 113-124, DOI: 10.5815/ ijisa.2012.10.12
[35]C.D. Katsis, I. Gkogkou, C.A. Papadopoulos, Y. Goletsis, and P.V. Boufounou,” Using Artificial Immune Recognition Systems in Order to Detect Early Breast Cancer”, I.J. Intelligent Systems and Applications, vol. 5, no.2, pp.34-40, DOI: 10.5815/ ijisa.2013.02.04
[36]P. Sidhu , MPS Bhatia and A. Bindal,” A novel online ensemble approach for concept drift in data streams”, In proceedings of IEEE Second International Conference on Image Information Processing (ICIIP), 2013, pp. 550-555, DOI: 10.1109/ICIIP.2013.6707652
[37]P. Sidhu and MPS Bhatia,” Extended Dynamic Weighted Majority Using Diversity to Handle Drifts”, New Trends in Databases and Information Systems, Advances in Intelligent Systems and Computing, Vol. 241, 2014, pp. 389-395.