IJITCS Vol. 7, No. 5, 8 Apr. 2015
Cover page and Table of Contents: PDF (size: 412KB)
Full Text (PDF, 412KB), PP.34-39
Views: 0 Downloads: 0
Feature Selection, Projection Pursuit, Dimensionality Reduction, Biomarkers
The selection of attributes becomes more important, but also more difficult, as the size and dimensionality of data sets grows, particularly in bioinformatics. Targeted Projection Pursuit is a dimension reduction technique previously applied to visualising high-dimensional data; here it is applied to the problem of feature selection. The technique avoids searching the powerset of possible feature combinations by using perceptron learning and attraction-repulsion algorithms to find projections that separate classes in the data. The technique is tested on a range of gene expression data sets. It is found that the classification generalisation performance of the features selected by TPP compares well with standard wrapper and filter approaches, the selection of features generalises more robustly than either, and its time efficiency scales to larger numbers of attributes better than standard searches.
Amir Enshaei, Joe Faith, "Feature Selection with Targeted Projection Pursuit", International Journal of Information Technology and Computer Science(IJITCS), vol.7, no.5, pp.34-39, 2015. DOI:10.5815/ijitcs.2015.05.05
[1]I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” The Journal of Machine Learning Research, vol. 3, 2003, pp. 1157–1182.
[2]I. Guyon and A. Elisseeff, “Special issue on variable and feature selection,” Journal of Machine Learning Research. 2003.
[3]Y. Saeys, I. Inza, and P. Larranaga, “A review of feature selection techniques in bioinformatics,” Bioinformatics, vol. 23, 2007, p. 2507.
[4]R. Kohavi and G.H. John, “Wrappers for feature subset selection,” Artificial intelligence, vol. 97, 1997, pp. 273–324.
[5]L.F. Wessels, M.J. Reinders, T. van Welsem, and P.M. Nederlof, “Representation and classification for high-throughput data,” Proceedings of SPIE, 2002, p. 226.
[6]R.L. Somorjai, B. Dolenko, and R. Baumgartner, “Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions,” Bioinformatics, vol. 19, 2003, p. 1484.
[7]S. Das, “Filters, wrappers and a boosting-based hybrid for feature selection,” Proceedings of the Eighteenth International Conference on Machine Learning, 2001, pp. 74–81.
[8]H. Liu and L. Yu, “Toward integrating feature selection algorithms for classification and clustering,” IEEE Transactions on knowledge and data engineering, 2005, pp. 491–502.
[9]J.H. Friedman and J.W. Tukey, “A projection pursuit algorithm for exploratory data analysis,” IEEE Transactions on computers, vol. 100, 1974, pp. 881–890.
[10]E.K. Lee, D. Cook, S. Klinke, and T. Lumley, “Projection pursuit for exploratory supervised classification,” Journal of Computational and Graphical Statistics, vol. 14, 2005, pp. 831–846.
[11]Q. Guo, W. Wu, F. Questier, D.L. Massart, C. Boucon, and S. De Jong, “Sequential projection pursuit using genetic algorithms for data mining of analytical data,” Anal. Chem, vol. 72, 2000, pp. 2846–2855.
[12]Q. Guo, W. Wu, D.L. Massart, C. Boucon, and S. De Jong, “Feature selection in sequential projection pursuit,” Analytica Chimica Acta, vol. 446, 2001, pp. 85–96.
[13]J. Faith, R. Mintram, and M. Angelova, “Targeted projection pursuit for visualizing gene expression data classifications,” Bioinformatics, vol. 22, 2006, p. 2667.
[14]J. Faith, “Targeted Projection Pursuit for Interactive Exploration of High-Dimensional Data Sets,” Proceedings of the 11th International Conference Information Visualization, 2007, pp. 286–292.
[15]L. Jimenez and D. Landgrebe, “High dimensional feature reduction via projection pursuit,” Geoscience and Remote Sensing Symposium, 1994. IGARSS'94. Surface and Atmospheric Remote Sensing: Technologies, Data Analysis and Interpretation., International, 1994.
[16]L.O. Jimenez and D.A. Landgrebe, “Hyperspectral data analysis and supervised feature reduction viaprojection pursuit,” IEEE Transactions on Geoscience and Remote Sensing, vol. 37, 1999, pp. 2653–2667.
[17]J. Nelder and and R.A. Wedderburn, “Generalized Linear Models,” Journal of the Royal Statistical Society. Series A (General), vol. 135, 1972, pp. 370–384.
[18]R. Rojas, Neural Networks. A systematic approach, Springer-Verlag, 1996.
[19]J.R. Quinlan, “Induction of decision trees,” Machine learning, vol. 1, 1986, pp. 81–106.
[20]L. Yu and H. Liu, “Feature selection for high-dimensional data: a fast correlation-based filter solution,” Proceedings of the Twentieth International Conference on Machine Learning, 2003, p. 856.
[21]I.H. Witten and E. Frank, “Data mining: practical machine learning tools and techniques with Java implementations,” ACM SIGMOD Record, vol. 31, 2002, pp. 76–77.
[22]Y. Lee and C.K. Lee, “Classification of multiple cancer types by multicategory support vector machines using gene expression data,” Bioinformatics, vol. 19, 2003, p. 1132.
[23]I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine learning, vol. 46, 2002, pp. 389–422.
[24]T.S. Furey, N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer, and D. Haussler, “Support vector machine classification and validation of cancer tissue samples using microarray expression data,” Bioinformatics, vol. 16, 2000, p. 906.
[25]L.F.A. Wessels, M.J.T. Reinders, A.A.M. Hart, C.J. Veenman, H. Dai, Y.D. He, and L.J.V. Veer, “A protocol for building and evaluating predictors of disease state based on microarray data,” Bioinformatics, vol. 21, Oct. 2005, pp. 3755-3762.