IJMECS Vol. 6, No. 6, 8 Jun. 2014
Cover page and Table of Contents: PDF (size: 677KB)
Full Text (PDF, 677KB), PP.58-70
Views: 0 Downloads: 0
Data Mining, Date Representation, Time Series, Discretization, Harmony Search Algorithm
The representation task in time series data mining has been a critical issue because the direct manipulation of continuous, high-dimensional data is extremely difficult to complete efficiently. One time series representation approach is a symbolic representation called the Symbolic Aggregate Approximation (SAX). The main function of SAX is to find the appropriate numbers of alphabet symbols and word size that represent the time series. The aim is to achieve the largest alphabet size and maximum word length with the minimum error rate. The purpose of this study is to propose an integrated approach for a symbolic time series data representation that attempts to improve SAX by improving alphabet and word size. The Relative Frequency (RF) binning method is employed to obtain alphabet size and is integrated with the proposed Multi-pitch Harmony Search (HSMPAR) algorithm to calculate the optimum alphabet and word size. RF is used because of its ability to obtain a sufficient number of intervals with a low error rate compared to other related techniques. HSMPAR algorithm is an optimization algorithm that randomly generates solutions for alphabet and word sizes and selects the best solutions. HS algorithms are compatible with multi-pitch adjustment. The integration of the RF and HSMPAR algorithms is developed to maximize information rather than to improve the error rate. The algorithms are tested on 20 standard time series datasets and are compared with the meta-heuristic algorithms GENEBLA and the original SAX algorithm. The experimental results show that the proposed method generates larger alphabet and word sizes and achieves a lower error rate than the compared methods. With larger alphabet and word sizes, the proposed method is capable of preserving important information.
Almahdi M. Ahmed, Azuraliza Abu Bakar, Abdul Razak Hamdan, "A Harmony Search Algorithm with Multi-pitch Adjustment Rate for Symbolic Time Series Data Representation", International Journal of Modern Education and Computer Science (IJMECS), vol.6, no.6, pp.58-70, 2014. DOI:10.5815/ijmecs.2014.06.08
[1]E. Keogh, S. Chu, D. Hart, M. Pazzani, An online algorithm for segmenting time series, in: Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, 2001, pp. 289-296.
[2]H.-G. Acosta-Mesa, F. Rechy-Ramírez, E. Mezura-Montes, N. Cruz-Ramírez, R. Hernández Jiménez, Application of time series discretization using evolutionary programming for classification of precancerous cervical lesions, Journal of Biomedical Informatics, (2014).
[3]C. Faloutsos, M. Ranganathan, Y. Manolopoulos, Fast subsequence matching in time-series databases, SIGMOD Rec., 23 (1994) 419-429.
[4]K.F. Chan, A. W, Efficient Mining of Partial Periodic Patterns in Time Series Database, in: Proceedings of the 15th International Conference on Data Engineering, IEEE Computer Society, 1999, pp. 106.
[5]J. Lin, E. Keogh, S. Lonardi, B. Chiu, A symbolic representation of time series, with implications for streaming algorithms, in: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, ACM, San Diego, California, 2003, pp. 2-11.
[6]C.-G.A.-M. Daniel-Alejandro Garc, Discretization of Time Series Dataset with a Genetic Search, in: Proceedings of the 8th Mexican International Conference on Artificial Intelligence, Springer-Verlag, Guanajuato, M\&\#233;xico, 2009, pp. 201-212.
[7]r. Fabian Mörchen, Alfred Ultsch, Optimizing time series discretization for knowledge discovery, in: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, ACM, Chicago, Illinois, USA, 2005, pp. 660-665.
[8]F. Rechy-Ram, G.A. Mesa, Efr, n. Mezura-Montes, N. Cruz-Ram, Times series discretization using evolutionary programming, in: Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II, Springer-Verlag, Puebla, Mexico, 2011, pp. 225-234.
[9]A.M. Ahmed, A.A. Bakar, A.R. Hamdan, Harmony Search algorithm for optimal word size in symbolic time series representation, in: Data Mining and Optimization (DMO), 2011 3rd Conference on, 2011, pp. 57-62.
[10]A.B. Layeb, Seriel Rayene, A Novel Quantum Inspired Cuckoo Search Algorithm for Bin Packing Problem., International Journal of Information Technology & Computer Science Vol. 4 (2012) p58-67.
[11]X. Xi, E. Keogh, C. Shelton, L. Wei, C.A. Ratanamahatana, Fast time series classification using numerosity reduction, in: Proceedings of the 23rd international conference on Machine learning, ACM, Pittsburgh, Pennsylvania, 2006, pp. 1033-1040.
[12]B.-K. Yi, C. Faloutsos, Fast Time Sequence Indexing for Arbitrary Lp Norms, in: Proceedings of the 26th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., 2000, pp. 385-394.
[13]K. Chakrabarti, E. Keogh, S. Mehrotra, M. Pazzani, Locally adaptive dimensionality reduction for indexing large time series databases, ACM Trans. Database Syst., 27 (2002) 188-228.
[14]G.-L. Alexander, Discretization algorithm Time Series Based on Aplicaci'on in entropy and colposcopic data, Universidad Veracruzana, (2007).
[15]D. Garc, Acosta-Mesa, Discretization of Time Series Dataset with a Genetic Search, in: Proceedings of the 8th Mexican International Conference on Artificial Intelligence, Springer-Verlag, Guanajuato; Mxico, 2009, pp. 201-212.
[16]E. Dimitrova, McGee, J., Laubenbacher, E, Discretization of Time Series Data, eprint arXiv (2005).
[17]A.M. Ahmed, A.A. Bakar, A.R. Hamdan, Improved SAX time series data representation based on Relative Frequency and K-Nearest Neighbor Algorithm, in: Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on, 2010, pp. 1320-1325.
[18]A.A. Bakar, A.M. Ahmed, A.R. Hamdan, Discretization of time series dataset using relative frequency and K-nearest neighbor approach, in: Proceedings of the 6th international conference on Advanced data mining and applications: Part I, Springer-Verlag, Chongqing, China, 2010, pp. 193-201.
[19]J. Shieh, E. Keogh, iSAX: indexing and mining terabyte sized time series, in: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, Las Vegas, Nevada, USA, 2008, pp. 623-631.
[20]J. Lin, Y. Li, Finding Structural Similarity in Time Series Data Using Bag-of-Patterns Representation, in: Proceedings of the 21st International Conference on Scientific and Statistical Database Management, Springer-Verlag, New Orleans, LA, USA, 2009, pp. 461-477.
[21]F. M, A. Ultsch, Optimizing time series discretization for knowledge discovery, in: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, ACM, Chicago, Illinois, USA, 2005, pp. 660-665.
[22]H.G. Acosta Mesa, Nicandro, C.R., Daniel-Alejandro, G.-L, Entropy Based Linear Approximation Algorithm for Time Series Discretization, In: Advances in Artificial Intelligence and Applications, 32 (2005) 214–224.
[23]G.-L. Alexander, Algorithm discretization Time Series Based on Aplicaci'on in entropy and colposcopic data, Universidad Veracruzana (2007).
[24]B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom, Models and issues in data stream systems, in: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ACM, Madison, Wisconsin, 2002, pp. 1-16.
[25]U.M.F.a.K.B. Irani, Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning, In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, (1993) 1022-1029.
[26]R.J. Larsen, M.L. Marx, An Introduction to Mathematical Statistics and Its Applications, Prentice-Hall, 1986.
[27]E. Keogh, Xi, X., Wei, L., R, C.A, The UCR Time Series Classiffication/Clustering Homepage http://www.cs.ucr.edu/~eamonn/time_series_data/, (2006).